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I, INTRODUCTION 


Sufficient experimental evidence has now accumulated to sub- 
stantiate the conclusion that the legibility of the printed page varies 
directly with the amount of brightness contrast between printed sym- 
bols and background. This is true both for chromatic and for achro- 
matic material. 

With a given amount of difference between the luminosity of 
symbols and background, two alternative arrangements are possible, 
light print on a dark background, and dark print on a light background. 
Which of the two is the more legible? Since we find the maximum 
brightness contrast between the extremes of the achromatic scale, 
black and white makes one of the most legible color combinations. 
The question asked above, therefore, most frequently reads, ‘‘ Which is 
the more legible, black print on a white background or white print on a 
black background?” This question suggests a further one. Why 
should there be any difference in legibility between these two 
arrangements? 

The first question is of considerable practical importance in advertis- 
ing, in the designing of instrument dials and visual test cards, and in 
relation to the legibility of photostats and blueprints. The second 
question is of more general theoretical interest. The present investiga- 
tion constitutes an attempt to answer both questions. 





* A study from the psychological laboratories of the University of Minnesota. 
+ The writer wishes to express her sincere appreciation to Dr. Miles A. Tinker, 
nder whose direction this study was completed, for his encouraging suggestions 
and criticisms throughout the course of the investigation, for his assistance in the 
onstruction of apparatus, and for his kindness in taking the eye-movement 
ecords of one section of the experiment. 
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Several previous investigations have dealt with some phase of 
the same general problem. Scott,’ using a short exposure technique, 
discovered that black letters on a white background were seen more 
frequently than were white on black. Starch*® found an advantage in 
speed of oral reading for black on white as compared with white on 
dark grey. Speed of silent reading, as measured by the Chapman- 
Cook Speed of Reading Test, was greater for black print than for white 
in a study conducted by Paterson and Tinker.® In an experiment by 
Holmes,’ isolated five-letter words were recognized at a greater distance 
when they appeared as black on white than when they were seen as 
white on black. 

Kirschmann,‘ on the other hand, measuring legibility in terms 
of recognizability in peripheral vision, discovered an advantage in favor 
of white on black whether the stimulus material consisted of isolated 
capital letters or whether it was geometrical forms. The size of the 
blind spot was discovered by Ferree, Rand, and Wentworth? to be 
somewhat larger for a black stimulus on a white ground than for white 
on black. Speed of vision (7.e., the reciprocal of the minimum duration 
of exposure necessary for correct discrimination of the position of the 
opening in the Landolt broken ring) was faster, according to Ferree and 
Rand! for a white test object on a black background than for black on 
white. A more minute analysis of their data, however, revealed that 
the time required for the discrimination of detail, after the sensation 
was once set up, was actually longer for the white test object on the 
black background than for the black on white. 


II. PLAN OF THE PRESENT INVESTIGATION 


The contradictory nature of these results, the inconsistency 
and incompleteness of the theories offered to explain them, and the 
variety of procedures and stimulus materials used seemed to justify a 
comprehensive attack on the whole problem which would repeat several 
of the previously used techniques and introduce any new ones that 
seemed to show promise of yielding relevant information. 

The present investigation, accordingly, was planned to repeat the 
short exposure technique of Scott, yielding span of visual apprehension 
as a measure of legibility; the campimetry method used by Kirsch- 
mann, with recognizability in peripheral vision as the criterion of 
legibility; and the distance method of Holmes. In addition, it was 
decided to photograph eye movements during the reading of black 
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and white print and from the resulting records determine differences in 
the oculomotor performance and speed of reading. 

Several factors, typographical and otherwise, which it was thought 
might affect the direction or the size of the difference in legibility 
between the black and the white print were selected for concomitant 
investigation. These included type size, type face, word form, and 
degree of context or meaning in the stimulus material. 


III. METHODS AND RESULTS 


A. Span of Visual Apprehension 


1. Procedure.—The apparatus and method used in this experiment 
have been described by Taylor and Tinker in an article on the effect 
of luminosity on the apprehension of achromatic stimuli. Data on the 
span of visual apprehension for white letters on a black background 
were collected at the same time as those discussed in the earlier report 
but were omitted from that discussion as irrelevant to the problem 
under investigation at that time. 

In brief, the stimulus material consisted of large block capitals 
(three by four and one-half inches) cut from the Milton Bradley 
Standard Papers, black, white, dark grey, and light grey, and pasted, 
nine letters to a card, on white (or black) cardboards (seven by twenty- 
eight inches). Each of the four brightness combinations appeared 
four times. The cards were exposed for three seconds. The one 
hundred twenty-eight university sophomores who served as subjects 
were tested in groups of about thirty each. 

2. Results—The quantitative results of this experiment appear 
below. The individual scores upon which the means are based were 
determined by totalling the number of letters out of a possible thirty- 
six (four exposures times nine letters per exposure) which had been 
correctly reproduced, irrespective of whether they were misplaced or 
in proper sequence. 
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Differences between these means, intercorrelations between the 
scores, and the ratio between each difference and its standard error* 
appear below: 








r Difference D/ep 
OO WE. GE BOT 6g onc occ cee sens 0.74 0.07 0.12 
BG, WG. TS BIO. oc. ec cece wccces 0.45 1.34 2.45 
Dark grey vs. light grey............. 0.46 1.41 2.72 
EO SE ME oo od aceulndeedes 0.65 0.86 3.18 
White vs. light grey................. 0.59 0.54 1.93 
onc cs ones 40 00 be bk 0.68 0.72 2.77 














The figures would seem to justify the following observations: 

1. There is a high positive correlation and almost no difference in 
average score between the apprehension of black and of dark grey 
letters. Since there was only a small difference in brightness (about 
sixteen per cent between black and the particular dark grey used in this 
experiment, we may attribute the equality in legibility to the fact that 
the black and the dark grey letters offer a large and nearly equal 
contrast with the white background. 

2. Light grey letters were definitely more difficult to apprehend 
than were black or dark grey letters. Although the differences between 
the mean score for light grey letters and the means for each of the 
other two are not three times their standard errors, and thus do not 
meet the conventional criterion for statistical significance, the ratios 
in question are large enough to justify a strong supposition that future 
differences would be in the same direction were the experiment to be 
repeated. The significant factor producing the lower legibility of the 
light grey letters would appear to be their small amount of brightness 
contrast with the white background. 

3. White letters on a black background were quite definitely less 
legible than either black on white or dark grey on white, and only 
slightly better than light grey on white. We may state it another 
way. Maintaining the maximum amount of brightness contrast, 
but reversing the positions of black and white from print to back- 
ground caused a greater loss in legibility than did a small decrease in the 
amount of brightness contrast (changing from black on white to dark 
grey on white) and effected almost as large a disadvantage as did a 





* op values were computed from the formula for correlated measures. 
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very great decrease in brightness contrast (changing from black on 
white to light grey on white). 


B. Legibility in Peripheral Vision 
EXPERIMENT I 


1. Procedure.—This experiment was an attempt to duplicate as 
closely as possible Kirschmann’s technique. In campimetry the 
stimulus material approaches the fixation point along a straight arm 
instead of along the arc of a perimeter, thus effecting a decrease both in 
its absolute distance from the eye and in its angular distance from the 
fixation point. The campimeters used in this study were fastened to 
the wall of a dark room, and the fixation points were one meter from 
the eye-hole in a cardboard screen behind which the subject sat. 
One campimeter, its stimulus rider, and its surrounding wall back- 
ground were black with a white fixation point. The other campimeter 
and its background were white with a black fixation point. Readings 
were taken only on the right horizontal meridian for the right eye. 
The illumination measured five foot-candles at the vertical test surface 
and came from a two hundred-watt daylight bulb in the ceiling directly 
above the subject’s head. 

The stimulus material consisted of the simple block capitals 
reproduced in Kirschmann’s,‘ published account of his experiments. 
His black letters were drawn in black ink on white cardboard squares. 
His white letters were slightly smaller than the black, to compensate for 
the effects of irradiation, and were cut from thin white cardboard and 
pasted on black velvet squares. Our black alphabet was exactly like 
Kirschmann’s, and was made by tracing his black letters onto white 
cardboard. We made three white alphabets for comparison with the 
black. One was an exact duplicate of the black save that the back- 
ground was inked and the letter left white. One was made by tracing 
Kirschmann’s white alphabet onto white paper, cutting out the letters, 
and pasting them onto black cardboard. A third alphabet was made 
by tracing his white letters onto thin white cardboard and mounting 
the cut-out letters on squares of black velvet. Throughout the remain- 
der of this report, these three white alphabets will be designated respec- 
tively ‘‘drawn,” “‘pasted,”’ and ‘‘velvet.’”’ In the drawn alphabet, 
the white letters were exactly the same size as black; in the pasted and 
velvet alphabets, the white letters were, like Kirschmann’s, slightly 
smaller than the black. 
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Six subjects, all graduate students in psychology, read each of 
the white alphabets in comparison with the black. The fifty-two 
readings required of each subject were distributed over three days with 
the order of presenting the letters systematically rotated from subject 
to subject so as to control practice and fatigue effects. An initial 
practice series with numerals as stimulus material was inserted for 
each subject to accustom him to the apparatus, the method of making 
the report, and the maintenance of a steady fixation. 

The stimulus always approached the fixation point from the 
periphery, and its distance from the fixation point at the time of its 
correct recognition was recorded to the nearest one-fourth centimeter. 

2. Results —Data from this experiment are presented in Table I. 
The figures appearing in column 2 of the table are the mean distances 


TaBLeE I.—Tue Rewative Lecrpinity or BLuack AND oF WuitEe LETTERS IN 
PERIPHERAL VISION 























Stimulus material Mean oM Differ- D/ep %D 
ence 
(1) (2) (3) (4) (5) (6) 

Black—drawn................. 11.06 0.32 
Meeodiee. .......... 000+. smiee| "|e" | * 
Black—drawn................. 11.38 0.38 
White—pasted................. 8.98 0.40 3.2 £.35 28.7 
Black—drawn................. 12.54 0.47 
White—velvet................. 9.64 | 0.40 2.90 £.70 $0.1 
Black—drawn................. 11.66 0.23 
White—all kinds............... 93s} os3 | 7%) 1 | %-! 
Black—(Kirschmann).......... 38.55 1.27 a 
White—(Kirschmann).......... 44.85 | 1.59 6.) 5.00 8 
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(in cm.) from the fixation point at which the letters were recognized, 
and are based on a number of cases equal to twenty-six letters times 
one reading of each letter times the number of subjects taking part in 
the experiment. Results from Kirschmann’s experiment are appended 
to facilitate comparisons with ours. 
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All of our differences* agree in pointing to the black letters on the 
white background as more legible (7.e., more easily recognized in 
peripheral vision) than the white letters on the black background. In 
every case the difference is more than three times its standard error, 
and the percent differences of column 8 (found by dividing the differ- 
ence between the averages by the smaller average) indicate relative 
differences in favor of the black letters which exceed the sixteen per 
cent advantage found by Kirschmann for the white letters. 


EXPERIMENT II 


1. Procedure.—Our inability to check Kirschmann’s results with a 
repetition of his method led to the hypothesis that the discrepancy 
might have been due to the fact that while each of our subjects read 
each alphabet only once, Kirschmann read each alphabet at least four 
times. Accordingly we conducted a second experiment in which each 
of five subjects read each alphabet (one black and one white) six times. 
The readings were distributed over twelve days and the letters presented 
in a systematically rotated order so that in each successive four-day 
practice period each letter was read twice on each background. Four 
of the five subjects read the white drawn alphabet in comparison with 
the black; the fifth read the white velvet letters. 

2. Results Combined data from the first four subjects, presented 
by successive four-day practice periods, appear below:T 








Black White _| Differ wed 
D/e p\ cent 

mean mean ence D 
First four days.............s. 11.66 + 0.28) 10.54 + 0.29) 1.12 | 2.80)10.6 
Second four days............. 12.78 + 0.28] 11.54 + 0.30} 1.24 | 3.02/10.7 
Third four days.............. 13.48 + 0.26) 11.14 + 0.29) 1.34 | 3.44/11.0 




















From these data it is apparent that practice improved ability to 
recognize letters in peripheral vision both when the letters were black 
on white and when they were white on black. But the improvement 
was greater for the former than for the latter so that the absolute 





* In this and in all succeeding tables, a difference preceded by a minus sign is @ 
difference in favor of the white letters (or print) on the black background. 
t Here, as in all the tables of this paper, the measure of variability reported in 
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conjunction with each mean is the standard error of the mean, i.e.¢m = = 
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difference between the two increased slightly. The percent differ- 
ences show a less marked increase for the later practice periods. 

An analysis of the data by individual subjects shows very similar 
trends. In no case was there any tendency for prolonged practice to 
bring about an advantage in favor of the white letters. The one 
subject who read the velvet letters showed the largest advantage of all, 
both absolutely and relatively, in favor of the black alphabet through- 
out the whole experiment. 


C. A Photographic Record of Eye-movements in Reading Black 
and White Print 


1. Procedure.—The stimulus material for this experiment consisted 
of sections, of four consecutive sentences each, cut from the Chapman- 
Cook Speed of Reading Test. Two sections were taken from Form A, 
printed as black on white, and two from Form B, printed white on 
black. The printing was done from reverse zinc etchings on white 
enamel stock. All four sections were read at a single sitting with the 
order of presentation systematically rotated from subject to subject to 
equate practice and fatigue effects. The subjects were twenty univer- 
sity sophomores. The eye-movement camera has been fully described 
by Tinker."! 

2. Results—Analysis of the eye-movement records yielded four 
measures of reading performance for each subject. (a) Total number 
of fixations made in reading the two sections of black print, or the two 
of white. (b) Average number of fixations per line, 7.e., the total 
number of fixations divided by the number of lines of print in the two 
sections. (c) Perception time, 7.e., the sum of the durations of the 
separate fixations for the two sections, expressed in fiftieths of seconds. 
(d) Pause duration, 7.e., perception time divided by the total number of 
fixations. The third of these measures is essentially a measure of 
speed of reading. 

Comparisons of the reading of black and of white print in terms 
of these four measures are presented on p. 569.* 

If we consider the third measure first, we find that the perception 
time required for reading the white print on the black background was 
significantly greater than that consumed in reading the same amount 
of material (two hundred forty words) printed as black on white. In 
terms of whole seconds, the difference between these two means is 





* D/op values in this experiment were computed from the formula for correlated 
measures. 
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3.14 and expressed in terms of words read per second, the difference 
is 0.79 in favor of the more conventional arrangement. 





Black mean White mean r D/ey 





a. Total No. of fizations..| 150.0 + 5.28) 167.4 + 65.71|.78| 17.4 | 4.74 
b. Av. No. of fix. per 





ae oie Se ce wie 6.14 + 0.22 6.76 + 0.23 - 80) 0.62) 4.43 
c. Total perception time. .| 1820.4 + 68.78) 1977.3 + 64.11|.84) 156.9 | 4.15 
d. Av. pause duration... . 12.20 + 0.34 11.909 + 0.28.74) 0.30) 1.30 

















Further examination of the data presented above reveals that the 
longer reading time for the white print is brought about by an increase 
in the number of fixations made rather than by an increase in the pause 
durations. 


D. Distance of Recognizability as a Criterion of Legibility 


1. Procedure—In the remainder of the experiments in this 
investigation legibility was measured in terms of the maximum 
distance at which the stimulus material could be read. The apparatus 
used in these experiments has been fully described by Tinker.” It 
consisted, essentially, of an extended bench and a movable carriage 
containing the stimulus material. One end of the bench was attached 
toasmall table bearing, on upright supports, a head rest for the subject. 

The testing procedure was identical in all of the experiments 
about to be reported. The stimulus material was inserted, and the 
carriage placed at a distance so great that the subject could read 
nothing. The carriage was then moved closer, by twenty centimeter 
steps, until all of the material had been correctly read. The experi- 
menter recorded the distance from the subject’s eyes, in centimeters, 
at which each item on the stimulus cards was recognized.* 

All of the stimulus material for these experiments, with one excep- 
tion, was printed on white enamel stock from zine etchings; for one- 
half of the material the black ink was applied to the raised letter 





* Each item was scored at the greatest distance at which it could be recognized. 
This ‘‘greatest distance” was defined as the first distance at which it was read cor- 
rectly providing that the immediately following reading was also correct and pro- 
viding, furthermore, that in all of the readings which followed there never occurred 
as many as three errors in succession. 
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outlines; for the other half, using the reverse etching, the raised back- 
ground was inked, the letter outlines thus being left white. Individual 
test items (letters, words, or sentences) were cut from these printed 
sheets so as to leave a substantial background around each item and 
mounted on black or white cardboards for insertion into the movable 
carriage. In the one exception noted above, Kirschmann’s simple 
block capitals were drawn on white cardboards; on half of the boards 
the letters were blackened with India ink, and on the other half, the 
background was blackened leaving the letters white. 

Illumination of the test material was from four twenty-five-watt 
Mazda day-light bulbs and was uniform and constant at one hundred 
foot candles in all except the first experiment on type size where it 
measured forty foot candles. 

The subjects were largely university sophomores; a few were 
graduate students in psychology. They were tested individually and, 
in the several parts of these experiments, varied in number from ten to 
twenty-four. In each section of this part of the investigation the 
arrangement of the stimulus material upon the cards and the order of 
presenting the cards to the subjects were carefully planned and system- 
atically rotated so as to control practice and fatigue. 

The stimulus material used in these experiments may be briefly 
described as follows: 

(a) Ten of the capital letters of the alphabet (BCEFGHNQRS) 
printed in Scotch Roman type in the following sizes: six, eight, ten, 
twelve, and fourteen point. 

(b) The same ten capital letters, bend drawn as simple block 
capitals. (See Kirschmann.*‘) 

(c) The same ten capital letters printed in Kabel lite type in the 
following sizes: six, ten, and fourteen point. (This is a style of type 
which employs neither hair lines nor serifs and so greatly resembles 
hand lettering. Samples of this, as well as of the Scotch Roman type 
face, appear in Fig. I of an article by Paterson and Tinker® on the 
influence of type face on speed of reading.) 

(d) Twenty five-letter words and twenty nonsense combinations 
of five letters each (made by spelling the words backwards), printed in 
lower case ten point Scotch Roman type. 

(e) Twelve sentences from the Chapman-Cook Speed of Reading 
Test; six from Form A, black on white, and six from Form B, white on 
black—printed in lower case ten point Scotch Roman type; nineteen 
pica line width set solid. 
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(f) Nonsense combinations of the lower case letters | and i (ll, li, il, 
it) printed in Scotch Roman type in three sizes: six, ten and fourteen 
point. 

2. Results ——These materials made possible an investigation of the 
following factors in relation to the relative legibility of the black and 
the white print. The influence of type size was discovered from the use 
of materials a, b,c,andf. Type face as a factor in the relative legibility 
of the two arrangements was analyzed by comparing the materials in 
c with those in a. Word form and its effects were studied thru the 
material of d. And finally, the influence of the degree of context or 
meaning, was revealed thru comparisons of all of the materials listed. 

Table IT presents the data from the first experiment concerning the 
effect of variations in type size upon the relative legibility of black and 
white print. 

TasLe II.—Tue Revative LEGIBILITy oF BLACK AND OF WHITE PRINT As 
AFFECTED BY VARIATION IN TYPE S1zE 








: Color of Differ- Per 
pas print ae “Mm ence Pie> | cent D 
ee Black 146.8 | 2.29 10.00 | 3.12 7.3 
White 136.8 | 2.25 
BD eceecsesecees Black 181.8 | 2.37 8.20 | 2.30 4.7 


White 173.6 | 2.67 


NE 40 tA t10.9.00 Senare Black 231.0 | 2.75 17.00 | 4.18 7.9 
White 214.0 | 3.00 

Se Sa cetetaceuns Black 286.0 | 3.86 26.00 | 4.55 10.0 
White 260.0 | 4.21 

ee Black 389.4 | 5.68 11.80; 1.51 3.1 


White | 377.6 | 5.34 


Block cap.’s........... Black 656.6 | 7.02 45.40 | 4.71 7.4 
White 611.2 | 6.59 























Each of the averages in this table, except the last two, is based 
upon four hundred readings (ten subjects times ten letters times four 
readings of each letter). The last two averages (for the Kirschmann 
simple block capitals) are each based upon six hundred sixty readings 
(eleven subjects times ten letters times six readings of each letter). 
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Column 5 shows only one exception to the general rule that with 
increase in size of type there is a steady increase in the size of the 
absolute difference between the average distance at which the black 
letters could be read and the average distance at which white letters 
were read. The per cent differences of column 7, however, indicate 
that relative to the magnitudes of the averages involved, the amount 
of inferiority of the white letters is apparently independent of type size. 

Further information on the relation between type size and the 
relative legibility of the two printing arrangements will be brought out 
in the discussion of the next two experiments. 

In Table III appear the data on the influence of type face. Each 
of these averages is based on two hundred forty readings (twelve 
subjects times ten letters times two readings of each letter). If we 
compare the percentage differences of column’s 7 and 12 for each of the 
three sizes of type, we may note the following facts: 

1. For both the Scotch Roman and the Kabel lite type faces the 
relative inferiority of the white letters on the black background is about 
the same in the ten and the fourteen point sizes, but is considerably 
greater in the six point (smallest) size. 

2. For Scotch Roman type in all three sizes, the white letters are 
very definitely less legible than the black. 

3. For Kabel lite type, in the two larger sizes, the relative differ- 
ence between the legibility of the black and the white print is so slight 
that one may, for all practical purposes, consider it negligible. Only 
in the smallest size (six point) does this type face show the black 
print more legible than the white, and even here, the relative differ- 
ence is not nearly as large as is the corresponding difference for the 
Scotch Roman type. 

Inspection of the averages in Table III (columns 3 and 8) suggests 
an explanation of why there is a greater relative difference between the 
black and the white print when we use Scotch Roman type than when 
we employ Kabel lite type. Note that for the fourteen and the six 
point sizes, the average distance at which the black letters could be 
read was greater for Scotch Roman than for Kabel lite type, and that 
the two averages were about equal in the ten point size. When the 
letters were white on a black background, however, the Kabel lite 
was read at a greater distance than the Scotch-Roman, for all three 
type sizes. 

The implications of these facts are obvious. The most frequently 
used modern type faces, like the Scotch-Roman, all involve serifs which 
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are added to the corners of the letters to counteract the blurring effects 
of irradiation from the white background. Kabel lite, on the other 
hand, is a serifless style of type. Consequently, when we compare the 


TasLe II].—Tue Revwative Leorinity or BLack AND Waite PRINT 
In ScotrcH-ROMAN AND Kase. Lite Typzs 











Scotch-Roman type Kabel lite type 
Color 
Type si f 
ase Per Per 
Mean| om Diff. | D/ep | cent |Mean| om Diff. | D/ep | cent 
D D 
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) | (11) | (12) 
14 point....... Black |414.17) 5.79 347.75) 7.11 
White 339 .42| 6.53 74.75| 8.56 | 22.0 345.50! 7.82 2.25) 0.21 0.6 
10 point....... Black |241.50) 3.23 244.33) 4.40 
White |195.33| 3.70 | 4:17| 9-40 | 23-6 jogs 67) 4.99 | 1-34) 0-20) 0.5 
OPE cscawis Black |165.08) 2.61 154.17) 4.12 
White |130.25| 2.77 | 24:83) 9-14 | 26-7 |14; 99] 3 56 | 12-92) 2-88) 9-1 






































TasBLeE I1V.—Tue RewvatTive LeGrsiuiry or BLack AND WHITE PRINT. 
StimvuLus MATERIAL WITH MINIMAL MEANING AND Form 











Standard scoring Strict scoring 
Color 
T i f 
a nia Per Per 
Mean; om Diff. | D/ep | cent | Mean; om Diff. | D/ep | cent 
D D 
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) | (11) | (12) 
146 poles. ..cces Black |370.83| 3.05 370.83) 3.05 
White |260.00| 5.38 110.83) 17.93) 42.6 242 50] 4.86 128.33| 22.36) 52.9 
10 point....... Black |247.08) 4.53 242.50) 4.75 
White|185.62| 7.20 | 1-46) 7-22) 33-1 |165 a9] 794 | 89-31] 9.46) 49.5 
Oss sbadon Black |127.50| 5.41 112.08) 3.29 
White| 72.921 3.26 | 4:58; 8-64) 74-8 | > 50) 2.71 | 49-58) 11.64) 79.3 






































two type faces, we find that in the black on white arrangement the 
serifless type suffers from irradiation, but in the white on black arrange- 
ment, it is the type having serifs which is subject to the greatest blur- 
ring from irradiation. 
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The next experiment involved stimulus material which possessed a 
minimum of both meaning and of differential form, 7.e., the letter com- 
binations ll, li, il, and 77. Because of the opportunity for greater 
frequency and accuracy of guessing by the subjects taking part in 
this experiment, a new and more rigid scoring system was devised for 
these records. According to this ‘‘strict”’ scoring system, the “‘great- 
est distance” at which an item could be recognized was redefined as the 
first distance it was read correctly providing that there were no errors 
in any of the succeeding readings. 

Analyses of these data by both the new ‘“‘strict”’ and the old 
“standard” (previously defined) scoring methods appear in Table IV. 
Each of the averages in this table is based upon ninety-six readings 
(four subjects times four stimulus items times six readings of each item). 

1. Again, type size seems to have little effect upon the relative 
legibility of the black and the white print save that here, again, the 
smallest size of type shows the largest relative (per cent) difference 
between the mean distance at which the black print was read and the 
mean distance at which the white print was read. 

2. There is very little difference between the two methods of 
scoring save, of course, that the averages are all a little smaller when 
the strict scoring is used. 

3. Both methods show a very large and statistically reliable advan- 
tage for the black print on a white background in all three type 
sizes. 

4. Although Table IV does not show it, an additional observation 
may be added. Of the four stimulus items used, the two with a char- 
acteristic and differentiating general shape, or total ‘‘Gestalt,” (li and 
al) were more easily read than the two which, at great distances, were 
indistinguishable rectangles (JJ and iz). For the latter two stimuli, 
(li and 72) the relative superiority of the black on white arrangement 
was much greater than for the two former stimuli (li and zl). With 
only one exception, this was true for all sizes of type and for both 
methods of scoring. The exception occurred in six point type, stand- 
ard scoring, in which instance the two relative (or per cent) differences 
were approximately equal. 

The influence of word form was tested by comparing the relative 
inferiority of the white print when the stimulus material consisted of 
words with that shown when the stimulus material was nonsense 
combinations of letters. The data essential for these comparisons 
appear as follows: 
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Black mean White mean 
WE neva cs ccane cok cusses tuneeakeen 159.70 + 1.95 | 136.28 + 1.95 
Ps i cuusc Oye becdstvenwstsbveeers 109.35 + 1.55 89.00 + 1.33 











The differences between these means, both relative and absolute, 
and the reliabilities of these differences follow: 











Difference D/cy a a 
Words: Black vs. white.................. 23.42 8.49 17.2 
Nonsense: Black vs. white.................. 20.35 9.98 22.9 
Black: Words vs. nonsense.............. 50.35 20 . 22 46.0 
White: Words vs. nonsense.............. 47.28 20.03 53.1 











These results would seem to justify the following observations: 

1. For both words and nonserse material, the black print was 
more legible than the white. 

2. The per cent difference in favor of the black print was greater 
for nonsense material than for words. 

3. For both black print and white print, the words were read at 
a considerably greater distance than was the nonsense material. 

4. The per cent difference in favor of the words, in comparison 
with the nonsense material, was greater for white print than for black. 

We may state these results in a slightly different way. The larger 
relative difference between the mean distances of recognizability 
of words and nonsense for white than for black print is not due to the 
fact that the auxillary cues to perception, which are incident to word 
form, are of greater assistance when the print is white than when it is 
black. It is due rather to the fact, that in the absence of word form, 
clear perception of details is hindered more when the print is white than 
when it is black, because, relative to the sizes of the respective averages 
involved, the white nonsense material is more inferior to the black 
nonsense, than are the white words to the black words. 

We come now to the final problem in this section of the investiga- 
tion. What is the influence of the degree of meaning, or context, in 
the stimulus material upon the relative legibility of the black and the 
white print? In answer to this question we re-present in Table V, data 
selected from several of the preceding tables and data from one addi- 
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tional experiment not yet reported. This last experiment involved the 
reading of sentences from the Chapman-Cook Speed of Reading Test. 
The averages for this material are based four thousand three hundred 
twenty readings each (twenty subjects X six sentences X thirty 
words per sentence). All of the stimulus material considered in Table 
V was printed in ten point Scotch Roman Type. The averages for 
Isolated Capitals are chosen from the second experiment (a comparison 
of type faces) rather than from the first experiment (the influence of 
type size) because in this first experiment the illumination of the test 
material was not the same as in all of the other experiments. 


TasBLeE V.—Txe ReEwvaATIvE LEGIBILITY OF BLACK AND OF WHITE PRINT 
AS AFFECTED BY VARIATION IN DEGREE OF CONTEXT AND MEANING 
IN THE StimvuLus MATERIAL 








Stimulus material and number of| Color of Mean Differ- D/ev Per cent 
subjects and readings print °m | ence " D 
(1) (2) (3) | (4) (5) (6) (7) 
Words in sentences............ Black /|172.05)0.55 
(20 subjects, 4320 readings)... White |154.99\0.48) 27-06 [23.37 | 11.0 
Isolated words................ Black |159.70/1.95 
(20 subjects, 400 readings)...... White |136.27/1.95| 72-42 | 8-49 | 17-2 
Nonsense material............. Biack |109.35)1.55 
(20 subjects, 400 readings)...... White | 89.00/1.33} 72-95 | 9-98 | 22.9 
Isolated capitals............... Black |241.50|3.23 
(20 subjects, 240 readings)...... White |195.33|/3.70 6.17) 9.00 | 28.6 
Letter combinations...... sia ne Black /|247.08/4.53 
(ll-li-il-ii) 61.46 | 7.22) 33.1 
(8 subjects, 96 readings)........ White |185.62/7.20 























The stimulus material is listed in order from the most meaningful 
(words in sentences) to the least meaningful (the Ul, li, il, and ii com- 
binations). Note that a decrease in the degree of meaningfulness or 
context in the stimulus material is accompanied by an increase in the 
size of the relative (per cent) difference between the black and the white 
print. The one apparent exception to this trend lies in the near 
equality of the differences for nonsense material and isolated capitals. 
Second thought, however, will make quite clear the fact that between 
these two classes of stimulus material there is really no difference in 
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meaningfulness; one precedes the other in the table simply because 
both cannot occupy the same position. 

By far the largest relative inferiority of the white print occurs 
with stimulus material which lacks not only meaning, but also any 
differentiating or characteristic form. 


IV. SUMMARY AND INTERPRETATION OF RESULTS 


As a preliminary to an interpretation of the results of this investiga- 
tion, let us review them briefly: 

1. In the first place, every method employed in this investigation 
yielded results in support of the general conclusion that black print 
on a white background is more legible than white on black. This was 
true whether we measured legibility in terms of span of visual appre- 
hension, speed of reading, recognizability in peripheral vision, or the 
greatest distance at which print could be read. 

2. Type size did not affect the relative difference in legibility 
between the two colors of print except to increase it somewhat for the 
smallest size used. 

3. A serifless style of type (Kabel lite) was equally legible as 
black on white or as white on black in all except the smallest type size. 

4. A progressive decrease in the meaningfulness of the test material 
was accompanied by an increase in the reiative inferiority of the white 
print. 

We are now in a position to attempt an answer to the second ques- 
tion asked at the beginning of this paper. Since both involve the 
maximum amount of brightness contrast between symbol and back- 
ground, why should there be any difference in legibility between the 
two printing arrangements, black on white and white on black? 
Without meaning to minimize the undoubtedly large influence of our 
much greater familiarity with the more conventional arrangement, 
the writer wishes to offer the following interpretation as the best 
integration of all the results obtained. In the most frequently used 
modern type faces (of which the Scotch-Roman is a good example), 
the detrimental effects of irradiation from the white background are 
counteracted by the addition of serifs which tend to preserve the char- 
acteristic forms of the letters by emphasizing their corners. When we 
reverse the conventional arrangement so that white letters appear upon 
a black-ground, irradiation will increase the apparent size of the letters; 
but it will also tend to blur their outlines, close their open spaces, and 
fuse their parts, and will consequently effect a decrease in their 
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legibility. These results will be most marked when the letters are 
small, when their component strokes are wide in proportion to the 
total size of the letter, and when there are no auxiliary cues to per- 
ception such as can arise from meaning or context in the material; 
and they should be least noticeable, or even entirely absent, with a 
serifless style of type, particularly in the larger sizes. 

The practical applicaticns of these results are fairly obvious. 
The advertiser who wishes to capitalize the greater attention value of 
the white-on-black arrangement should realize that he will run fewer 
chances of incurring a disadvantage from decreased legibility if he uses 
the more novel arrangement only when he can use fairly large sizes of 
type, or if, in a situation where the copy involves much reading matter, 
he employs a simple, serifless type face. Similar recommendations 
might be made to those who design the faces of instrument dials or 
prepare the lettering for photostats and blueprints. 
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A METHOD FOR INVESTIGATING THE VALIDITY OF 
THE CATEGORIES OF A JUDGMENT TEST 


R. A. BROTEMARKLE AND 8. W. FERNBERGER 


University of Pennsylvania 


The well recognized need of wide variability in mental tests as 
well as the resulting necessity of analytical interpretation of individual 
scores does not relieve the clinical psychologist of the responsibility of 
validating the categories and methods employed in tests. In fact the 
weakest spot of mental testing probably remains the careless method 
of internal validation of test materials. If significant variability 
resulted from the test construction based on a specific idea, reliability 
was then studied as a necessary evil, but validity of the experimental 
format was almost always presumed from the most meager facts of 
reliability. 

The present study is a cooperative effort of clinical and experi- 
mental psychology to validate the categories of a judgment test. The 
Roback Judgment Test has been employed for a number of years in 
its present form as test item number eleven in the Roback Mentality 
Tests for Superior Adults. In 1927 Brotemarkle? reported using the 
test as an individual test for didactic purposes in a fundamental 
course in mental analysis and testing, the development of standards, 
and also a study of its use in college student personnel problems. 
Again in 1929 Gillespie and Brotemarkle* reported on the interpolation 
of further norms. Further evidence of reliability was reported by 
Brotemarkle* to the point that the Roback Judgment Test belonged 
to a series of ‘‘complex mental processes of intellectual organization” 
among which ‘‘the point of emphasis in mental testing at the college 
adult level is definitely indicated.” 

Roback has reported no further on the material since its inclusion 
in his test and manual for scoring, and then only to indicate scoring 
method. We have continued to compile further norms and indications 
of reliability. Before reporting these, however, we have returned 
to the study of the validity of Roback’s original categories of judgment. 
In attempting this task psychophysics has joined effort with clinical 
psychology. 

The results studied were obtained by standard procedure from a 
group of three hundred thirty-five under-graduates (one hundred 
ninety-seven men and one hundred thirty-eight women) in a funda- 
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mental laboratory course in clinical psychology. The results comply 
with the curve of distribution of the college adult level. The results 
were collected by two student workers under the FERA as directed 
by Brotemarkle. 

The observed relative frequencies of the different categories of 
judgment for all observers (men and women) will be found in Table I. 
The observed relative frequencies for any category will be found in a 
single horizontal row. The first row gives the relative frequencies 
with which striking sentences were judged in order as striking, com- 
monplace, tautological, jocular and absurd. The last column, marked 
X, gives the relative frequencies with which the observers failed to 
make any judgment. It will be observed that the number of such 
failures is small. 

An examination of this table indicates that, for each of the two 
extreme categories of striking and absurd, the values are highest for the 
appropriate judgment and decrease toward the other extreme, with an 
inversion at the extreme lower end in each case. For the other three 
categories of commonplace, tautological and jocular, the curves are 
of the relatively bell shaped form with the maximum value at the 
place of the appropriate category. In order to fractionate our 
results, those for men and women have been separated and the observed 
relative frequencies for these judgments will be found in Tables II 
and ITI. 

It will be noted that these values give curves relatively similar 
in form to those obtained in psychophysical experiments, a form of 
curve to which Urban® has given the name of the Psychometric Func- 
tions. It has seemed possible, therefore, to apply the calculation 
methods of the method of constant stimuli to these data in the form 
suggested by Urban.* Actually these results conform more closely 
to the method of single stimuli’ inasmuch as each sentence was judged 
in isolation against a scale of judgment categories. These data are 
furthermore in the nature of the group rather than the individual 
threshold—an experimental form suggested by Gordon. 

In order to accomplish this, however, it is necessary to make the 
following several assumptions: (1) That the five categories of judgment 
represent equally distant steps of a judgment scale. (2) That it is 
proper to assign arbitrary numerical values from one to five to these 
steps. (3) That the extreme categories of striking and absurd may be 
chosen to represent actual extremes and that the categories of common- 
place, tautological and jocular may be chosen as intermediate cate- 
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gories. (4) That the no-judgment (X) results be considered as part 
of the intermediate category or categories as is the practice with the 
“don’t know” judgments in psychophysical procedures.® 

The calculations of the method of constant stimuli are of advantage 
inasmuch as complete knowledge of the results is obtained by the 
calculation of only the relative frequencies of the two extreme cate- 
gories. These calculations give the following final values: (1) Two 
values of the coefficient of precision (h; and hz). These values indicate 
the steepness of the curves—the higher the value of h the steeper the 
curve. ‘They then give an index of the relative degree of homogeneity 
of judgment. (2) Two values of the threshold or limen which may be 
defined as the most probable value of stimulus on which a judgment 
of either extreme will be correctly judged with a probability of 0.5 or 
that stimulus value where a correct extreme judgment is.as likely to 
occur as not to occur. (3) The interval of uncertainty which is 
defined as the difference between the two thresholds and within which 
neither extreme judgment will be given with a frequency as great as 0.5. 
The size of the interval of uncertainty, therefore, is an index of the 
sensitivity of the individual or of the group. (4) The point of sym- 
metry (point of subjective equality) which is the value of stimulus 
corresponding to the point of intersection of the curves of the psycho- 
metric functions for the two extreme categories and which, Urban has 
shown, corresponds to the maximum value of the curve of the psycho- 
metric functions of the intermediate judgments or judgment. 

These calculated values for the present data, considering striking 
and absurd as extreme categories and grouping commonplace, tauto- 
logical, jocular and no-judgment as intermediate judgments, will be 
found in Table IV. The values for the total results will be found 
in the first row and the fractionated results for men and women will be 
found respectively in the second and third rows. There are no 
significant differences between the values for the fractionated results 
for men and women and, indeed, the similarity throughout is striking. 
When one considers the values for the total results in the first row, it 
becomes obvious, from a consideration of the values of h, that there 
is greater homogeneity of judgment for the absurd category than for 
that of striking because the hz is greater than the h;. The points of 
symmetry, in the last column, give values very close to the central 
value of three. 

The threshold values for striking (S:) and absurd (S_) are respec- 
tively very high or very low. In all three cases for striking, the values 
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are lower than the numerical value we have assigned to this category 
and in two cases for absurd the threshold values are higher than the 
numerical value of this category. In these cases, then, our calcula- 
tions have been in the nature of an extrapolation and, on this basis, we 
reject this method of grouping. 

An alternate and equally acceptable method of grouping is to 
consider absurd and jocular as one single extreme category; striking 
and commonplace as the other single extreme category and tautological 
and no-judgment as a single intermediate category.° On this basis 
we have recalculated the results and the final values will be found in 
Table V, which has a form exactly similar to Table IV. 

In all three cases of the total or fractionated results, the he is 
greater than the hi, which indicates that there is greater homogeneity 
of judgment for the absurd-jocular category than for the striking- 
commonplace group. The values of the thresholds (S; and S:2) are 
now observed to be within the range of stimulus values and are there- 
fore meaningful as are also the values of the interval of uncertainty. 
The values of the points of symmetry are all very close to the central 
stimulus value for both total and fractionated results. 

All values for the fractionated results for men and women are 
similar but the men seem to have slightly greater homogeneity of 
judgment for both extreme categories, as indicated by the higher 
values of both coefficients of precision (h; and hz). Also the men 
have a slightly better accuracy of judgment as indicated by the 
slightly smaller interval of uncertainty. 

Some time ago these same psychophysical concepts were applied to 
data of the onset of puberty in boys and these concepts seemed to 
have given more meaning to these curves.'! The present results are 
given because again they seem to add something of a methodological 
and definitive character to the test field. It seems that several 
conclusions may be drawn from the foregoing. 

1. Not forgetting our assumptions that the categories of judgment 
were equidistant and that it was proper to assign numerical values to 
them, it would seem, from the nature of the results, that these assump- 
tions were justified. This method of handling the results gives us, 
then, a method for testing the validity of the categories of the judg- 
ments of the test. 

2. The Roback categories are internally valid for the purpose of 
test construction as at present employed. 


10 
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3. It gives us a more definitive method of comparing the judgments 
of different groups. Our fractionated data are similar, for the groups 
were homogenious being all University students and differing only 
in the matter of sex. Even so it would appear that differences can be 
indicated by this method which could not have been determined by a 
mere consideration of the raw relative frequencies. It would seem 
that if two groups as dissimilar as normal and insane were compared 
that we might expect to find significant differences in the final calcu- 
lated values which would give a better and a more definitive picture 
of these differences than a mere consideration of the relative or actual 
frequencies of judgments. 

4. There have been several attempts recently to establish a psycho- 
physics of phenomonology and these results indicate the possibility 
of a psychophysics of logic. 
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logical statistical concepts.”’ Ped. Sem., Vol. XXIII, 1916, pp. 360-366. 
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TasBLe I.—OssERvED RELATIVE FREQUENCIES OF JUDGMENT (MEN AND WoMEN, 
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N = 335) 

Cate- s C T J A x 

gories 
8 0.58 0.12 0.05 0.03 0.17 0.04 
C 0.26 0.56 0.05 0.02 0.09 0.02 
T 0.04 0.34 0.46 0.05 0.09 0.03 
J 0.04 0.06 0.04 0.60 0.23 0.02 
A 0.08 0.06 0.02 0.16 0.64 0.04 











TasLe IJ.—Osservep RELATIVE FREQUENCIES OF JUDGMENT (Men, N = 197) 























Cate- s C T J A x 
gories 
s 0.62 0.12 0.06 0.03 0.14 0.04 
C 0.25 0.58 0.04 0.02 0.09 0.00 
T 0.04 0.31 0.48 0.05 0.10 0.03 
J 0.04 0.07 0.03 0.62 0.23 0.02 
A 0.06 0.06 0.02 0.14 0.67 0.05 








TaBLeE III.—Osservep RELATIVE FREQUENCIES OF JUDGMENT (WOMEN, N = 138) 



































Cate- s C T J A x 
gories 

S 0.53 0.13 0.04 0.04 0.19 0.06 
C 0.28 0.52 0.06 0.03 0.10 0.02 
T 0.05 0.38 0.42 0.05 0.07 0.03 
J 0.04 0.06 0.05 0.58 0.23 0.01 
A 0.09 0.06 0.02 0.18 0.59 0.06 
TaBLE [V.—CaLcuLaTED CONSTANTS FROM VALUES OF A AND S ALONE 

he hi S2 S; Iof U PS 

Total men and women.............| 0.33 | 0.22 | 5.07 | 0.20 | 4.87 | 3.13 

Men.............ccceececeeee-| 0.387 | 0.26 | 4.95 | 0.54 | 4.41 | 3.14 

Women.......................-.--| 0.30 | 0.19 | 5.26 | 0.68 | 4.54 | 3.22 

















TaBLE V.—CALCULATED CONSTANTS FROM VALUES OF S + C anpJ +A 


























he hi S2 S, |Iof U| PS 

Total men and women.............| 0.47 | 0.34 | 3.45 | 2.49 | 0.96 | 3.05 
rer 0.50 | 0.37 | 3.44 | 2.54 | 0.90 | 3.05 
Women...................-.----| 0.42 | 0.80 | 3.54 | 2.43 | 1.11 | 3.07 
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THE EDUCATIONAL ACHIEVEMENT OF A GROUP OF 
GIFTED NEGRO CHILDREN 


PAUL A. WITTY AND MARTIN D. JENKINS 


Northwestern University 


Although the literature of quantitative measurement is replete with 
studies of the mental ability of Negroes, there have been reported few 
comprehensive investigations of the educational achievement of Negro 
children. Almost without exception, the median intelligence test 
scores of Negro groups have fallen conspicuously below the averages 
of various white populations. This literature has been redacted 
recently by a number of writers (Price,'* Garth,’® Witty and Lehman,”* 
Yoder”). Similarly, in school achievement the average Negro child is 
invariably described as being inferior to the average white child 
(Wilkerson). While competent psychologists are becoming increas- 
ingly cognizant that our present measuring instruments and test norms 
are of doubtful validity in such comparisons, the uncritical have made 
sweeping generalizations concerning the “‘lack of educability” and the 
general constitutional inferiority of Negro children. Furthermore, 
attention to the array of low average scores has diverted attention from 
the fact that there are conspicuous and significant individual differ- 
ences within the Negro groups. Especially unfortunate has been the 
lack of recognition or appreciation of the presence of unusually capable 
children in Negro school populations. One leaves the literature with 
the impression that the Negro child constitutes hopeless school mate- 
rial, and that he is destined for a low educational estate. In addition, 
one is led to conclude that gifted Negro children are so anomalous in 
the public school that diligent search for them would prove unprofitable. 

All persons who have taught Negro pupils will attest, perhaps, that 
some of the children have exceptional scholastic ability; however, in so 
far as the writers can discern, no study dealing with the educational 
achievement of exceptional Negro children has been published. This 
paper presents a detailed analysis of the educational achievement, as 
measured by the New Stanford Achievement Test, Form W, of twenty- 
six Negro children whose IQ’s (Stanford-Binet) are one hundred forty 
or above. These children were chosen as the brightest from a group 
of more than one hundred Negro children of superior intelligence 
(Stanford-Binet IQ one hundred twenty and above). The educational 
attainment of this group will be compared with that of groups of 
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gifted children which have been studied by Terman, Hollingworth, 
Witty, and others. 


SELECTION OE SUBJECTS AND VALIDITY OF THE MEASURES 


The twenty-six subjects were identified in a systematic search for 
superior Negro children in grades three to eight of seven public schools 
of Chicago. In these schools approximately eight thousand Negro 
pupils are enrolled. The gifted pupils of this study are distributed 
among six of the schools, since in one of the schools no gifted child was 
found. The method of selection was similar to that used by Terman® 
—classroom teachers nominated the following children: (1) The child 
thought most intelligent, (2) the child doing the best class work, and 
(3) children one or more half-years under age for their grades. The 
McCall Multi-Mental Scale was administered to all of the nominees, 
and the Stanford-Binet test was then given to every child who had 
been credited with an IQ of one hundred twenty or more upon the 
McCall Scale. By this method twenty-six children of Stanford- 
Binet 1Q one hundred forty or above were located. The New Stanford 
Achievement Test, Form W, was given to the twenty-six gifted children. 
The test was administered during a single day, and, to avoid fatigue, 
appropriate rest periods were provided. 

The New Stanford Achievement Test includes a fair sampling of 
the subject-matter stressed in the subject-fields of reading, spelling, 
language usage, literature, history and civics, geography, physiology 
and hygiene, and arithmetic. Although one may state that this test 
does not measure important concomitants of education, or even the 
direct outcomes of progressive curricula, it appears that the test is 
an appropriate instrument for gauging the educational gains effected 
by the traditional course of study followed in the schools which the 
children attend. 

Familiarity with tests is always a factor to be considered in estimat- 
ing suitability of a test for any group. This group of children is not 
“‘test-wise’’; few have had any experience previously with standardized 
tests. 


AGE-GRADE DISTRIBUTIONS 


In Table I, the children are assembled by age and grade. Thereisa 
rather continuous distribution throughout the grades with no grade 
contributing an unusually large number of subjects. The median of 
the grade distribution is 5.7. There is also a rather even age spread, 
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the ages ranging from six years, nine months to thirteen years, five 
months. The median age is ten years, two months. 


TaBLeE I.—AGe AND GRADE PLACEMENT OF TwENTyY-six GirTep NEGRO 
Cuitpren, Graves III-VIII 





Present school grade 
Chronological age Total 


Ill IV V VI | VII | VIII 
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The extent to which the group is accelerated in school is revealed 
by the progress quotient (PQ).* The mean PQ is one hundred twenty 
(SD = 8.8); this indicates that the typical child of this group is accel- 
erated twenty per cent of his age beyond the age norm for unselected 
children of his age. Terman found the progress quotient of his group 
to be one hundred fourteen (Terman, op. cit., p. 257) and Witty, 
one hundred sixteen (Witty,?? p. 19). 

This relatively high progress quotient is attributable in part to the 
fact that the Chicago schools accelerate more pupils than does the 
typical school system (cf. Strayer,” Vol. V, p. 85, and Mort & 
Featherstone!*). 


STANFORD ACHIEVEMENT TEST RESULTS 


The mean educational attainment of the pupils upon each of the 
sub-tests of the Stanford Achievement Test is shown in Table II. 
The data show, as have those of all other studies of the educational 
achievement of gifted children, that these children have mastered 





* The individual progress quotient (PQ) was calculated by dividing the mean 
age of children in each grade by the age of each child. 
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educational subject-matter in excess of their present grade placement. 
In only one case is there a negative difference between grade status as 
indicated by the test results and present grade placement. The 
average pupil of this group has mastered the subject-matter 1.4 grades 
above his present grade placement; the range is from —.5 to +3.6 
grades. The educational opportunity for this group seems decidedly 
inadequate and restricted: these children are now studying subject 
matter which they have mastered, and are pursuing methods which 
they have outgrown. 

As no control group was used it is difficult to estimate accurately 
the degree of acceleration of the children. There is evidence, however, 
that a group typical of the population from which this group was 
selected will not attain the grade norms upon the New Stanford Test. 
(Several Chicago schools containing an enrollment predominately of 
Negroes were excluded from this study; these were schools located 
in areas characterized by low social-economic milieu. The seven 
schools of this study contain, in the opinion of the writers, children who 
come from homes somewhat above the average of the homes of the 
entire Negro population of Chicago.) Bousfield? administered the 
reading and arithmetic tests of the New Stanford Achievement Test to 
two hundred twenty-two Negro pupils in one of the elementary schools 
of Chicago not included in this study. Bousfield’s group was retarded 
one grade in reading and about one-half grade in arithmetic. Beck- 
ham! gave the reading and dictation sections of the New Stanford 
Achievement Test to one hundred unselected eighth grade pupils in one 
of the schools included in our study. He found the group to be more 
than one-half grade below the norms in reading and in spelling. If one 
assumes that the composite of these studies represents the typical 
educational condition of the school populations from which the gifted 
group was selected, it becomes apparent that the educational superior- 
ity of these children must produce a distinctly serious problem since the 
classmates of the gifted are educationally retarded, and since work is 
doubtless articulated around the assumed needs of the average child in 
the entire group. The superiority of the gifted children is strikingly 
displayed in the degree to which they excel the test norms for children 
of their chronological age. The typical child in our group has attained 
an educational development more than three grades (3.3) in excess of 
the norms for children of his chronological age. Two of the children 
have mastered educational subject-matter more than five grades 
beyond that which is the norm for children of their ages. These 
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facts assume added significance when one considers that our typical 
gifted child has been in school only about five years. 

Table II shows means and the SD’s of the educational and subject 
quotients.* The mean educational quotient (EQ) is 133.7, and the 
range of EQ is from one hundred fourteen to one hundred sixty-nine. 
Mean subject quotients vary from 126.5 in arithmetic computation to 
146.6 in language usage. Particularly noticeable in Table II are the 
low subject quotients in arithmetic and the high ones in language usage 
and reading. And great variability is reflected in the SD’s of certain 
of the subject quotients. The large sigma of the mean in language 
usage is especially conspicuous. Scrutiny of Table ITI leads to other 
significant speculations. The children appear to achieve best in those 
subjects which are apparently least dependent for their development on 
classroom instruction. 


TaBLE I].—Megans Aanp SD’s oF THE SusrecT QUOTIENTS OF TWENTY-8ix GIFTED 
Necro CHILDREN 








Subject Mean SQ SD 
ee  . ... wcccnhBuseswoetasacena 145.7 12.4 
, TEER AAR egy kere at aU 141.6 12.4 
I on on ace caekees cae eeeeuewe 143.8 9.7 
NES Fon gnydk4sncds Wedenbesbonvetes 137.2 13.1 
PS. acoso oes cambdedidea deeds ees 146.6 18.9 
RET SEE Pe eee pee: oe Ne moe 133.4 18.5 
nn hs Ol ole a i atin 128.4 19.1 
Physiology and hygiene....................-- 131.3 17.1 
a ay aR a a 131.4 15.4 
PT ci ccccécececcosecusscees 126.7 13.6 
Arithmetic computation...................6-- 126.5 17.7 
Avernge arithmetic... ... 0... cccccccccccccccces 127.3 12.8 
Ne i 05 cca tied adie mee abe bend 133.7 11.8 











Furthermore, the high reading and language quotients suggest a 
superiority in reading ability in these gifted Negro children almost as 
marked as that in mental ability. 

The meagre arithmetic skills have already been noted. The mean 
SQ for the total arithmetic test is 127.3. The disparity between the 
arithmetic quotients and the other subject quotients has educational 
implications for the teachers of these children. If these are “‘typical”’ 





* EQ here refers to composite attainment (EA) divided by CA. Subject 
quotient refers to subject age (SA) divided by CA. 
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gifted Negro children, they exemplify certain educational attainments 
which depart markedly from the characteristic achievements of 
unselected Negro children in the elementary school. For unselected 
Negro children, attainment in reading skills and habits is frequently 
reported to be low, and achievement in arithmetic computation to be 
relatively high. Several studies of the educational achievement of 
Negro children (Bousfield,? Foreman,’ Witty and Decker*‘) suggest 
that typical Negro children succeed better in arithmetic than in 
reading.* 


THE ACCOMPLISHMENT QUOTIENTS 


Since some writers believe that the “gifted”? frequently become 
intellectual vagabonds in our schools, the writers employed the 
accomplishment quotient technique to ascertain whether this allega- 
tion applied to gifted Negro children. The accomplishment quotient 
is derived by dividing the EQ by the IQ. Deviations above and 
below one hundred reflect presumably the extent to which mental 
ability is expended in the tasks of the school as well as the degree to 
which effort is put forth. AQ’s may be calculated for the composite 
achievement and for particular subject achievements. 

The writers are well aware of the limitations of this technique. 
(Cf. Davis and Campbell,’ Keys and Whiteside,!2 Popenoe,'!* Wil- 
son?!), Many investigators have found negative correlations between 
intelligence quotients and educational quotients. The accomplish- 
ment quotient technique apparently sets an expected measure of 
attainment which is too high for the bright to hope to attain in the 
traditional school. Despite its limitations, the accomplishment 
quotient concept is useful in educational measurement, securing, as 
it does, an approximation or crude measure of the extent to which 
pupils are achieving in terms of their mental abilities. The AQ’s are 
presented here chiefly for comparison with the results of other studies. 

Since gifted Negro children do better in reading than in other 
subjects, a relatively high accomplishment quotient in reading would 
be anticipated. The mean AQ in reading is 97.2; the range is from 
eighty-nine to one hundred nine; and six pupils have AQ’s at or 
above one hundred. The mean accomplishment quotient in reading 
is appreciably higher than the accomplishment quotient for the entire 





* Wilkerson,?° reviewing the studies of the educational achievement of Negro 
children, finds that this relationship is not invariably reported. 
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test. Furthermore, the variability of the test scores about the mean 
is noticeably small. 

The composite AQ’s range from seventy-nine to one hundred three, 
with a mean of ninety-one, and an SD of 5.8; only two children have 
AQ’s one hundred or above. Without doubt the children are not 
challenged by their present educational opportunity. They are 
placed usually in classes with pupils whose ability does not provide 
competition, and they are being educated in a system which makes 
almost no provision for pupils of “surpassing” ability. In some 
instances, they are being taught by teachers who have no idea that 
pupils of such extraordinary superiority exist. Months of observa- 
tion of the children and the classes have convinced the writers that 
these children (as a group) have little or no opportunity to use their 
unusual capacities in their present school environs. 

It will be of interest at this point to compare the EQ’s and AQ’s 
of these children with similar quotients of gifted children included in 
studies in which the incidence of Negro children has been negligible. 

Terman" administered the Stanford Achievement Test to five 
hundred sixty-five gifted children in California. He found that their 
educational attainment was below that which was considered con- 
sonant with their mental superiority. Witty’? studied the educational 
attainments of one hundred gifted children, and his results cor- 
roborated those of Terman. Gray and Hollingworth, however,™ 
studied the comparative achievement, as measured by the Stanford 
Achievement Test, of one group of gifted children enrolled in a special 
class and of another in several schools in New York City. These 
investigators found that the gifted children earned accomplishment 
quotients about one hundred. However, Cobb and Taylor® set forth 
the New Stanford Achievement Test results for a group of New York 
City children enrolled in a special class, and Patrick investigated the 
attainment of gifted children in segregated classes in Louisville. These 
groups conformed to the general pattern for gifted children in their 
performance on the Stanford Achievement Test: educational retarda- 
tion in terms of mental ability appeared. Coy‘ also studied the 
educational achievement in 1923 of a special class for gifted children; 
the mean IQ of her group was only 128.5. In 1930 Coy® reported 
the accomplishment quotients (the mental ages were derived largely 
from group test scores) of children of various intelligence quotient 
levels. Included were the AQ’s of children whose IQ’s fall in the 
intervals one hundred thirty-one to one hundred forty and one hundred 
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forty-one to one hundred fifty. These ratios also follow the general 
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ranged from one hundred twenty-nine to one hundred seventy-five; 
the mean was one hundred forty-three. EA’s were obtained from the 
results of the Stanford Achievement Test. The lowest subject 
quotient was in arithmetic; as was the lowest quotient in the present 
study. Achievement test data in the form of educational ages only 
were reported; the subject, educational and accomplishment quotients 
were computed by the writers from data presented in case studies. 
It is of special interest at this point to note that the mean accomplish- 
ment quotient of Proctor’s group was one hundred. 


TasBLeE 1V.—CokEFFICIENTS OF CORRELATION (PEARSON) BETWEEN MENTAL AGE 
CHRONOLOGICAL AGE, AND EDUCATIONAL AGE 


Variable 1 = Mental Age 
Variable 2 = Educational Age 
Variable 3 = Chronological Age 


rig = +.90 + .025 Ti2.3 = +.58 
Ti3; = + .89 + .028 Ti3.2 = + .50 
Te3 = + .87 + .032 T2.1 = + .42 


The findings of the studies noted above are assembled, for com- 
parison with the group here under consideration, in Table III. The 
form of the Stanford Achievement Test used in the older studies 
contained only four items (the reading, spelling, arithmetic, and 
language usage tests) which appear to be directly comparable with 
those of the New Stanford Test. Comparison of the gifted Negroes 
with the other groups at these points reveals, in general, striking 
similarity. A disparity, however, is found in one subject. The Negro 
group falls considerably below the other groups in both sections 
of the arithmetic test. Nevertheless, it is clear that the Negro group 
conforms to the general pattern, displaying educational superiority 
which permeates all subject-matter areas. 

The high coefficients of correlation reveal the reciprocal relation- 
ship of factors which are difficult to disentangle. With CA constant, 
the r between MA and EA is +.58; this lower relationship suggests 
that the mental tests and the educational tests may be measuring 
different but positively related aspects of growth. 

Anomalous is the finding that the group contains a disproportion- 
ately large number of girls (nineteen girls and only seven boys). 
Other investigators have found their groups to contain larger numbers 
of boys than of girls. 'The number of cases here is too small to warrant 
generalization regarding the significance of this result. It is interest- 
ing that the seven boys have a mean IQ of one hundred forty-nine, a 
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mean EQ of one hundred forty, and a mean AQ of ninety-four. These 
means are somewhat higher than those for the entire group. 

Variations in the records of individual children may often be 
explained by background factors. Pupil 1 (See Witty and Jenkins,*® 
for a more complete account of this case) with an IQ of two hundred 
has a subject quotient of one hundred eighty-one in word meaning, 
one hundred seventy-two in physiology and hygiene, and only one 
hundred twenty-six in arithmetic computation. A detailed study 
of this child discloses the fact that she has a phenomenal vocabulary 
and is very precise in the use of words; she has a deep interest in 
science and expresses the desire to become a chemist. Her relatively 
low attainment in arithmetic may be caused by the conspicuous 
discrepancy between her mental age and her grade placement. Pupil 
26, who has the lowest educational quotient in the group is obviously 
working far below her true potentialities. A combination of lack 
of motivation and a very poor home from the quantitatively measur- 
able, as well as from the qualitative standpoint, contribute to effect 
low educational achievement. Pupil 2, having the highest educational 
quotient as well as the highest accomplishment quotient, presents 
the unique case of the only pupil whose EA exceeds his MA. A study 
of the contributing factors reveals that this boy is highly motivated— 
he has a driving intellectual curiosity and the school has recognized 
his ability and given him unusual opportunities for development. Not 
only has he been accelerated (he completed the eighth grade at age 
ten years, six months) but his program.has been enriched. His 
parents have provided much encouragement and exceptional oppor- 
tunity for his educational development. 


SUMMARY OF FINDINGS 


This paper presents the performance on the New Stanford Achieve- 
ment Test of a group of gifted Negro children in grades three to eight 
of the Chicago public schools. The findings may be summarized 
briefly: | 

1. The mean IQ (Stanford-Binet) is 148.9; the mean EQ 133.7; and 
the mean AQ 91. 

2. The mean grade placement is 5.9 and the mean age of the group 
is nine years and ten months. The mean progress quotient is one 
hundred twenty. 

3. The average pupil has mastered the subject-matter (as measured 
by the New Stanford Achievement Test) 1.4 grades above his present 
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grade placement and 3.3 grades in advance of that of the norm for 
children of his chronological age. 

4. The highest subject quotients are in language usage (146.6) and 
in reading (143.8), and the lowest is in arithmetic computation (126.5). 

5. Statistical treatment of the data suggests that mental age 
contributes heavily to educational age. 

6. There is an unusually large number of girls in this group, the 
distribution including nineteen girls and seven boys. 


CONCLUSIONS 


Data secured from atypical groups, the members of which are all 
products of the same school system, must be interpreted with caution. 
The following conclusions are limited to this group of gifted Negro 
children and to Negro children from a strictly comparable milieu: 

1. Gifted Negro children may be found with about equal frequency 
at every grade- and age-level in the elementary school. 

2. Gifted Negro children demonstrate greatest educational supe- 
riority in those highly “verbal’’ subjects which appear not to depend 
greatly on school experience. 

3. Our gifted Negro children do not achieve educational attain- 
ment consonant with expectations based upon mental tests. In this 
respect, they conform closely to the pattern of other groups of the 
gifted. The Chicago schools should make some provision for the 
enrichment of the school experiences of these superior youngsters. 
Two studies (Gray and Hollingworth,'! Proctor,?’) suggest that an 
AQ one hundred can be attained by gifted children. 

4. The coefficients of correlation obtained between mental age (as 
measured by the Stanford-Binet) and educational age, as well as the 
general educational superiority of this group, which was selected 
solely on the basis of superior performance on the Stanford-Binet, 
suggest the essential validity of this test as an instrument for identify- 
ing potentially capable Negro pupils in the elementary school. 
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THE ROLE OF INSIGHT IN PLANE GEOMETRY* 


LYLE K. HENRY 


University of Iowa 


The purpose of the experiment herein reported was to test the 
hypothesis that the mental behavior observed in solving originals 
in geometry under controlled conditions could be adequately and 
correctly described as the operation of ‘insight.’ 

In order to test this hypothesis it was necessary to do three things: 

1. Determine, in so far as possible, adequate and satisfactory 
criteria of insight. 

2. Set up a controlled situation in geometry which offered oppor- 
tunity for the characteristics of insight to appear. 

3. Observe whether or not the characteristics of insight appeared 
universally, frequently, rarely, or not at all. 


THE CRITERIA OF INSIGHT 


The concept and use of the term “insight”’ in this study were based 
on the following considerations: 

1. Insight, as a psychological term, is by no means new, having 
been defined by Baldwin? for example, in 1901 as the ‘apprehension 
of the more subtle and profound aspects of truth in a relatively 
immediate and direct way.” : 

2. Insight has a place as a technical term outside the configura- 
tional school and is assigned such meanings as “the knowledge of 
significant relations and hidden resemblances,” Bentley ;* ‘seeing into” 
or “understanding the situation,” Dashiell;’ “immediate process of 
apprehension,” Warren and Carmichael.?? 

3. Insight in the Gestalt literature is defined as “‘the appearance 
of a complete solution with reference to the whole lay-out of the field,” 
Koehler ;* ‘‘to understand or see into,’”’ Ogden;'* ‘an organized response 
at the level of conscious behavior,’’ Wheeler.” 

4. The matter of insight in human subjects has been little investi- 
gated and is not well understood. Three studies in particular, those 
of Alpert,! Bulbrook® and Dunkelberger and Rumberger® were patterned 
after Koehler’s with the apes. Alpert found insight to be present 





* This paper presents aspects of a Ph.D. thesis performed at the University of 
Iowa under the direction of Dr. F. B. Knight. 


598 


— Foe eee OO’ 


rr rr -_ — bt 





is] ees Eee 


~ \w 


cote me CD 


sf 





The Réle of Insight in Plane Geometry 599 


if the solution was reached but qualified the phenomenon as being 
sometimes partial and also as sometimes gradual. In the other two 
investigations cited the writers did not find that the behavior could 
be adequately or properly described as the operation of insight. 
Maier!® in studying concrete problem solving behavior of humans 
(as well as rats) found insight to appear in characteristic fashion and 
concluded that the behavior could not be accounted for by the princi- 
ples of association and trial-and-error. 

5. Lists of criteria of insight by Yerkes,” Bingham,’ Wheeler,” 
Alpert! rather generally establish as characteristics of insight the 
transposition and application of principles in a novel situation, a type 
of behavior which takes into account natural interrelations in the 
situations and definite changes in the way the subject feels about the 
situation. 

In view of these considerations, insight, as used in this study, 
would involve the correct application, in a sudden and confident 
manner, of a principle in a problem situation. 


THE EXPERIMENT 


The experimental materials selected were fourteen theorems in 
straight line figures in plane geometry and nine geometric originals 
which required the application of these theorems for their solutions. 
Each theorem and each original was made up on a series of three by 
five inch cards which gradually elaborated the situation from card A 
to card H, as follows: 


Card A.—(A written statement of the conditions of the problem.) The medians 
of a triangle. 


Card B.—(A drawing of the figure.) 
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Card C.—(Elaboration of the figure by construction lines.) 











Cc 
Card H.—(A list of questions of the heuristic type which could be offered as 


hints.) 


Where is O located with reference to the total length of the median? 
How far is the point from the vertex? 

What is the locus of 0? 

How were points M and N located? 

What is the length of NO? 

What about length of O£ in relation to AE? 

How would you put in helping lines there? 

How are the construction lines drawn? 

Do we know that NO equals OD? 

What about the significance of AEF and DC? 


PSEPNSKSHPPYPS 


—_ 


These stimulus cards were presented in the above order through 
an exposure screen to the thirty-two tenth-grade students who were 
taking second-semester geometry at the University of Iowa Experi- 
mental School. These subjects were experimentally seasoned and 
possessed superior ability (ninety-fifth pércentile) in geometry as 
measured by the Iowa Academic Tests. The experiment required 
two one-hour periods for each pupil. The following instructions 
were read by the subject: 

“Through the opening in the screen you will see a series of cards 
presented one at a time. On each card you will see a statement or a 
figure representing a situation in geometry to which you are to react 
orally. No writing is necessary. 

““You are asked to tell what comes to your mind, and to speak 
freely in response to certain questions. The questions to which you 
are to react are written on a separate card and will be given you before 
the start of the series of cards. 

“Some of the situations may be presented several times in a differ- 
ent way in order to get the relation between what is given and your 
response. In every case you are to respond with reference to the 
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questions on a separate card. You will. be informed when you have 
stated the thing to be proved, and again when you have stated the 
proof. Some of the material will no doubt be too easy for you and 
some may be too hard. These facts are valuable to me in each case so 
please do your best at all times to tell what is in your mind. Feel 
free to volunteer any information that does not seem to be covered 
by the questions. 

“In fairness to the results it is necessary that everyone begins this 
experiment without any previous knowledge concerning it. This 
means that you must not discuss the experiment with anyone before 
the first of May. Please tell the experimenter if you are not willing 
to do this.” 

Questions on card: 

1. What are the relationships present in the situation? That is, 
what things appear to be true? 

2. What do you think is to be proved? 

3. How would you prove it? Make clear the principle involved. 

A complete record of each interview was made by the use of the 
Iowa Oral Language apparatus, a microphone-dictaphone recording 
technique, perfected by Betts and Greene.‘ 

In terms of the experimental materials, answers to the following 
questions were sought: 

1. Are geometry students able to state a theorem when its condi- 
tions are presented in written form? 

2. Are students able to apply the theorems which they readily 
state in proving other theorems and in proving originals which they 
have not hitherto seen? 

3. When does the adequate response occur in a problem situation 
with respect to the degree of elaboration of the stimulus pattern? 

(a) When the conditions are presented in written form? 

(b) If not in (a) will it occur when the figure is presented? 

(c) If not in (a) or (6) will it occur when necessary construction 
lines have been added to the figure? 

(d) If not in (a), (6), or (c) will it occur after hints have been given 
in the form of questions of the heuristic type? 

4. What is the effect of asking questions of the heuristic type as 
an aid in leading the pupil into the réle of the discoverer? 

Were insight a common experience, one would expect the following 
occurrences with reasonable frequency in this experiment: 
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1. The correct transposition and application of known theorems 
in simple originals which involve the known pr‘nciples. 

2. Evidence of a relation in the subject’s thinking between factors 
which are logically and geometrically related. 

3. A suddenness of perception and realization of facts and relation- 
ships of the solution accompanied by a change in the tempo and 
expression of the verbalizing behavior. 

4. A feeling of surety and confidence in the correctness of the 
solution. 


RESULTS 


The data available for consideration consisted of: 

1. Typewritten copies of the two one-hour interviews of each 
of the thirty-two geometry students. These copies contained well 
over one hundred thousand words of meaning-full context, representing 
the living, working voice of the thinker at the moment of thought. 

2. Data sheets showing when and where the adequate responses 
occurred in terms of the exposure card and the number of seconds 
required for the response. 

Since transfer and application of principles in a certain fashion 
are criteria of insight it is necessary to observe how well the principles 
themselves are known and understood. 

The performance of the subjects on both theorems and originals 
is presented in Table I. 


TaBLE ].—PERCENTAGE OF PupiL Responses TO StimuLtus MATERIALS ON 
THEOREMS AND ORIGINALS 











Theorems Originals 
Card 
Theorem | Theorem | Problem | Problem 
stated proved stated solved 
A 64.8 7.5 17.9 4.6 
B 26.9 10.5 34.0 27.4 
D 3 14.2 
H 6.4 11.4 39.3 26.6 
O 1.6 28.1 8.8 32.3 
ee ge epee” ote able 28.3 
eR Lelia seseeen’ Sande S owbkbes. & wantbe 9.1 
A a Fitk bck WOES ose onpe ots 100.0 100.0 100.0 100.0 

















1 The proof of the congruency theorems was postulated. 
2 Proved a fact that utilized the adequate solution of the problem. 
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It will be noted that in 64.8 per cent of the cases, subjects stated 
the theorem immediately when its conditions were presented in written 
form as on Card A; 26.9 per cent more stated the theorem, not on 
Card A, but on Card B, or as soon as the figure representing the 
theorem was presented; in three-tenths of one per cent of the cases, 
the subjects failed to state, the theorem on Card A, then failed on 
Card B, but stated the theorem on Card D where the figure contained 
suggestive construction lines; in 6.4 per cent of the cases, subjects 
failed to give an adequate response on Cards A, B, and D but stated 
the theorem after the experimenter gave a suggestion orally from Card 
H while the subject was still looking at the figure. In only 1.6 per 
cent of the cases did subjects completely fail to state the theorem 
after the stimulus pattern had been elaborated through Cards A to H. 

In 28.1 per cent of the cases, subjects failed to give the proof 
of theorems which they could state in 98.4 per cent of the cases. 
This is rather surprising since the theorems were logically related 
in that proving one theorem involved applying another theorem in 
the experimental list. Likewise, in 32.3 per cent of the cases, the 
subjects could not apply the theorems they knew in solving a hitherto 
unseen original requiring the application of these theorems. 


INTERRELATIONS OF ASPECTS OF PERFORMANCE ON RELATED FACTORS OF 
EXPERIMENTAL MATERIALS 


The functional relationships between specific aspects of perform- 
ance are more definitely set forth in scattergrams such as Table II. 


TaBLE II.—Tue RELATION BETWEEN THE ABILITY TO STATE A THEOREM AND THE 
Axsitity To Give Its Proor 





Theorem stated 
Card Total 








A B D H O 
Theorem proved 
B 41 5 ¥ - - 46 
D 52 5 1 4 be 62 
H 33 9 ne 7 1 50 
O 84 18 sa 17 4 123 
ak std cane hb bine wie 41 81 ae ¥ 2 124 
EEG Aid esas wen eke ken 284 | 118 1 28 7 438 
































604 The Journal of Educational Psychology 


There was a total of four hundred thirty-eight pupil responses on 
theorems. In two hundred eighty-four cases, subjects stated the 
theorem on Card A but eighty-four times the subjects could not prove 


TasB.Le II].—Tue RELATION BETWEEN STATING A THEOREM AND SOLVING AN 
ORIGINAL Reguirine Its APPLICATION 




















Theorem stated Not 
Card tnd Total 
A | he | O 
Original solved 
A 4 9 we - = 13 
B 56 20 ei 2 a 78 
H 64 10 2 “3 ce 76 
O 56 21 6 3 6 92 
Subject O.K.............. 17 9 a ‘ ge 26 
| ERR Re ert eae eae 197 69 8 5 6 285 














the theorem at all. Being able to state a theorem is clearly no index 
as to the pupil’s ability to prove it. However, it is very unlikely that 
a subject can prove a theorem without aid unless he can state it 
unaided. 

Table III reveals the fact that the ability to state a theorem is a 
necessary but not sufficient factor in solving an original based on 


Taste IV.—Tue RELATION BETWEEN PROVING A THEOREM AND SOLVING AN 
ORIGINAL BASED ON THE THEOREM 











Theorem proved Not 
Card Granted costed Total 
A B D H O 
Original solved 
A - 1 2 a 1 9 és 13 
B 16 6 9 1 12 34 = 78 
H 6 8 9 8 29 16 ie 76 
O 6 4 13 11 | 34 19 5 92 
Subject O.K...| .. 3 1 1 9 12 A 26 
Se 28 22 34 21 85 90 5 285 





























the theorem. In only two cases did subjects who could not state 
the theorem solve the dependent original without hints from the 
experimenter. 
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Table IV brings out the point that it is not always necessary to 
be able to prove a theorem in order to use it in solving an original. 
However, the number of instances, thirteen out of two hundred 
eighty-five, is not large enough to indicate an incongruity. Although 


TaBLE V.—TuHE RELATION BETWEEN UsING A THEOREM CORRECTLY AND USING 
THE SAME THEOREM INCORRECTLY IN SOLVING GEOMETRIC ORIGINALS 








Card A B D H O Total 

Correct use 
A es 6 6 
B 3 8 11 
D o 0 
H 2 4 2 8 
O 6 S 14 
pe Por 9 15 24 
ic a 5c 4 'o66 diac bai 1 rp i 1 
ML + nt oni oboe km acdaenen 21 41 2 64 























subjects sometimes solve originals by using a theorem that 
they cannot prove, the chances are good that a subject who 
cannot solve an original after hints have been given is also weak 
on the proof of the underlying theorem. Further data of the nature 
of Table IV show that using Theorem A, for example, in proving 


TaBLeE VI.—Tue RELATION BETWEEN STATING THE PROBLEM OF THE ORIGINAL 


AND SOLVING THE ORIGINAL 











Problem of original stated 
Card Total 
A B H O 
Original solved 
A 10 1 1 1 13 
B 13 33 22 10 78 
H 3 26 43 4 76 
O 20 27 36 9 92 
Cs chiuseouseeate 5 10 10 1 26 
, eee ee 51 97 112 25 285 




















Theorem B does not give assurance that Theorem A will therefore 
be used in solving an original. However, the chances are very small 
that subjects who are unable to use theorems in proving other theorems 
will be able to use them in solving originals. 
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From Table V, it can be concluded that a correct application of a 
theorem in solving an original does not mean that the subject can 
always handle the concept adequately. 

It will be noted that in sixty-four instances, subjects used theorems 
incorrectly. In seventeen of the sixty-four cases, subjects used 
theorems correctly and without aid in one situation, and incorrectly 
in another. 

In regard to aspects of originals, Table VI indicates that the 
ability to perceive relationships in a figure tends to be related to 
the ability to solve the problem. For example, in forty-three of the 
two hundred eighty-five instances, subjects stated the problem and 
then gave the solution after a verbal hint had been given by the 
experimenter. Likewise, in thirty-three cases, both the problem 
and the solution were stated on Card B. However, perceiving a 
relationship and giving its geometrical proof represent different 
degrees of complexity of a mental function. While the problem of 
the original was given on Card A in fifty-one instances, in twenty of 
these the proof was not achieved even after hints had been given. 
Further, while subjects failed to state the thing to be proved in but 
twenty-five instances, they failed to give proofs in ninety-two cases. 


THE SUDDENNESS OF PERCEPTION AS EVIDENCED BY A CHANGE IN THE 
TYPE OF VERBALIZING BEHAVIOR OF THE SUBJECT 


In the solution of originals, one important factor for which the 
experimenter watched was the ‘‘aha”’ experience in the form of expres- 
sions of discovery, surprise and elation. This phenomenon should be 
one of our best criteria of genuine insight so far as the factors of 


TasLe VII.—Recorp or “‘Oxn, I Sex” Exprriences in Sotvina ORIGINALS 


Tue OBJECT OF THE EXPERIENCE FREQUENCY 
Neen Ea 6 iu ale eeeeesamae ede ouaeneaee Mas 3 
SE OE Eee ae 3 
Recognition of parallel lines. ................cccccccccscccecs 2 
Recognition of a parallelogram..................ccceeeeeeees 2 
Seca wo cecuccdasecseséaecsensees 1 
Recognition of corresponding angles....................25045 1 
Recognition of vertical angles..................cececececeees 1 
Recognition of line joining mid-points........................ 1 
ie EE i iis end weiss wae ae a eukd ace heme hes 1 
i a tas i hb ain bk ple O-8 1 
Misperception of congruent triangles....................2000. 1 
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suddenness and vividness are concerned. Bulbrook® has pointed out 
that the verbal behavior of the subject is a good index to the insight 
experience. Table VII reveals the following facts: 

1. This sudden, enlightened response occurred but seventeen times 
in two hundred eighty-five problem situations. Had it occurred in 
each successful solution of the original the number would have been 
one hundred ninety-three. 

2. In but three cases was the solution of the problem the object 
of the experience. 

3. In eleven cases the experience was related to the recognition 
of a fact in the figure. 

4. In one case the experience was a ‘“‘false-alarm.”’ An unequal 
pair of triangles were declared congruent. 


5. The experience was not characteristic of any particular problem 
or individual. 


EVIDENCE OF FAILURE ADEQUATELY TO CONSIDER THE CHARACTERISTICS 
OF THE SITUATION 


Further data on the nature of the subjects behavior are to be 
found in the types of errors made, both on the theorems and on the 
originals. Table VIII furnishes this information. Attention should 
be called to the relatively large number of errors of types 3, 4, 6, 
8, 11, 14, 15, and 17. The names of the types of errors are self- 
explanatory. Most of the types of errors listed have the following 
characteristics: 

1. They violate the exactitude of thought and logical sequence 
which are considered outstanding objectives of geometry’* as well 
as our criteria of insight. 

2. They do not take into consideration the characteristics of the 
situation. 

3. They appear to be responses to elements of situations rather 
than to relationships. 

4. They can hardly be considered “intelligent” errors. 


CONCLUSIONS 


The following conclusions were reached: 

1. With the correct application of known theorems to the solution 
of simple originals as a criterion, insight may be said to have operated 
in thirty-two per cent of the cases. By the use of hints the value was 
raised to fifty-eight per cent of the cases, representing the total number 
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of solutions achieved by the subjects. The ability to note the rela- 
tionships present, that is, to state the fact to be proved in an original 
operated in fifty per cent of the cases. This value was stepped up to 
ninety per cent by hints from the experimenter. 


TaBLe VIII.—List or INapEQquATE RESPONSES TO EXPERIMENTAL MATERIALS 

















Frequency 
Type 
Series A | Series B 
i a is doa dene Oslewmee deem 5 1 
2. Prove what is already given.................cceeeeeees 10 8 
3. Wrong congruency theorem..............cccccccccceee 17 24 
Oe rE oc cece ccncccscedaceseoees 28 0 
5. Confused mumbling of terms...................000008- 12 1 
i 6 i Uva oe béReee beaee 30 17 
7. Misstatement—triangle for parallelogram, és cceneaen 2 4 
ee ela Coe aint as aes ok ood neeakenwe’ 21 13 
9. Loss of goal; forgetting what to prove.................. 6 2 
10. Assumption of special case............... 2c cece cence 7 3 
11. Data stated but their application to problem not sensed..| 12 16 
12. Principle not stated correctly in terms of figure.......... 7 2 
13. Recognition of isolated facts...................0e eee eee 13 9 
14. Principle correct but failure to apply it to figures........ 21 5 
15. Data in figure interpreted contrary to data given........ 36 81 
i ti an ni ao nae eme ape shrew en 10 4 
17. Difficulty in interpreting or visualizing data given........ 2 29 
18. Passively admitting of data...............ccccccceeees 1 4 
19. Lack of understanding after being told proot. iis Gineeeees 9 10 
20. Uncertainty about proof after having stated it........... 6 2 
Retake tie atdciebibe Gis 040-4 bons 6s ans 6» en aen 255 235 
Average per problem for the thirty-two subjects........... 18.2 26.1 





2. The following criteria of insight failed to receive a significant 
amount of supporting evidence: 

(a) The indication of a relation in the subject’s thinking between 
factors that were logically and geometrically related. The ability 
to respond to common factors in situations was not consistent from 
situation to situation. 

(b) The suddenness of appearance of the solution. The ‘Oh! 
I see’”’ experience occurred but seventeen times in two hundred eighty- 
five problem situations and in only three cases was the solution the 
object of the occurrence. 
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(c) The feeling of surety and confidence in the correctness of the 
solution. At the termination of a series of statements by the subjects, 
they were frequently asked whether or not they had completed the 
proof or why a certain thing was true. Their answers in most cases 
revealed a lack of surety and often a tendency to change a reason 
for statement previously made. 

3. The value of the data on the transfer of theorems as a criterion 
of insight is jeopardized by the fact that the same theorems were used 
incorrectly almost as often as correctly. 

4. The large number of errors due to failure to take into account 
the characteristics of the situation suggests a general lack of insightful 
behavior. : 

In view of these considerations the writer holds that while insight 
occurred in the solution of originals, it was not the robust, universal 
trait that characterized the rational sciution of problems and was, 
therefore, inadequate and unsuited to describe the typical behavior 
in this experiment. In a strictly scientific sense the writer’s char- 
acterizations of problem behavior are of necessity limited to the 
conditions of this study. However, the nature of the experimental 
conditions seems to insure a fair answer to the question relative 
to the presence or absence of insight in the solution of geometric 
originals. If insight fails to occur consistently under conditions 
which technically facilitate its occurrence, (superior geometrical 
ability and experience, favorable training, direction of subject’s 
attention to interrelations through instructions and manipulation 
of the stimulus pattern) it seems fair to suspect that it would not occur 
under mediocre or poor educational conditions. 
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CHANGES IN THE ATTITUDES OF COLLEGE 
STUDENTS 


W. J. BOLDT AND J. B. STROUD 


Kansas State Teachers College of Emporia 


INTRODUCTION 


The purpose of this investigation is to study the effect of college 
training upon the attitudes of college students respecting social, 
political, religious, and international questions. Stated more con- 
cisely, the aim of this study is to determine whether, as a result of 
college training, students become more liberal or more conservative, 
or remain unaffected in their attitudes toward these issues. While 
the scope of the present study is restricted, it seems most desirable 
to obtain some definite measures of what college training does to the 
students who come under its influence. 

We have pretty definite evidence of the influence of such training 
upon the student’s fund of knowledge in certain academic subjects; 
and we possess some statistical data which, while probably not always 
critically treated, show in a general way that a college education 
increases the earning power of the individuals who receive it. We 
have assumed that the function of the college goes much beyond 
this—that its students are trained for more useful citizenship, that 
their capacities for appreciation of cultural values are enhanced, and 
so on. In short, we have assumed that the college-trained man or 
woman excels those who have not had such training in many respects 
besides the knowledge of academic subjects and the ability to earn 
a livelihood. However, there has been little systematic effort to 
measure these effects. Experimental studies directed toward this 
end would be very timely. The study reported here represents in a 
small way an attack upon a certain phase of this central problem. 


PROCEDURE 


The measure of attitudes employed in this study is Harper’s test 
of Social Beliefs and Attitudes.! This test appears to have been con- 
structed according to the highest standards of measurement. It 
was designed to measure conservatism-liberalism as expressed in 





1 Harper, Manly H.: Social Beliefs and Aititudes of American Educators. 
Contributions to Education, No. 294, Teachers College, Columbia University. 
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“certain fundamental social beliefs and attitudes.”” That the test 
is valid, that is, that it is a measure of conservatism-liberalism, is 
attested by several factors, two of which are mentioned here. A 
group of educators was rated on a ten-point scale by competent 
judges as to conservatism-liberalism on social questions. The median 
of the ratings correlated .76 + .132 with the scores actually made 
by these educators on the test. In the second instance, forty-seven 
judges were requested to inspect each of the seventy-one items or 
propositions in the test, and to state whether an affirmative answer 
is indicative of conservatism or liberalism. The average agreement 
of the judges for the entire list of items was over ninety-eight per cent. 
The reliability of the test as determined by correlating two sets of 
scores obtained three weeks apart from the same group of subjects 
was found to be .90 + .013. The test may be regarded as satisfactory 
both in validity and in reliability. 

This test was administered to seven hundred thirty-eight college 
students of the Kansas State Teachers College of Emporia in the Fall 
of 1933, this number comprising about sixty per cent of the total 
enrollment of the college. The subjects were selected at random; 
and the number is large enough to insure an adequate sampling. 

The plan of treating the results is as follows: The average test 
score was computed for each of the five classes—freshman to graduate. 
It is assumed that any progressive change in average performance from 
class to class would reflect the influence of college training. Of 
course, this assumption as it stands is not entirely justifiable, but 
supporting evidence will be introduced subsequently which, in the 
opinion of the authors, substantiates the claim. An attempt has 
been made to ascertain whether or not students who major in one 
group of subjects show any greater change on the test than those 
who major in some other subject group, and to determine whether 
or not there is any relationship between the amount of change and the 
number of hours taken in a particular group of subjects. For this 
purpose the curriculum was divided into three main groups consisting 
of the social sciences—psychology, sociology, economics, history, 
education, and commerce, the physical and biological sciences, and 
the humanities—English, languages, speech, art, music, etc. Finally, 
an effort has been made to determine whether or not attitudes and 
beliefs toward some social problems are changed more than those 
toward other problems. The following groups of test items were 
compared for this purpose, namely, those dealing with wealth and 
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property rights, internationalism, politics, government ownership, 
capital and labor, social problems, and religion. 


RESULTS 


Attention is called to the fact that a high score is associated with 
liberalism and that a low score is associated with conservatism. There 
are seventy-one items in the test; a score of seventy-one represents 
the highest possible liberal score, and a score of zero represents the 
highest possible conservative score. No implication is made as to 
which score is right or wrong. 

The average test score of the four college classes and of a small 
group of graduate students is given in Table I. 


TaBLE I.—RELATION OF LIBERALISM OF COLLEGE STUDENTS TO THE NUMBER OF 
Years Spent IN COLLEGE, TOGETHER WITH THE DIFFERENCES OF THE MEANS 
AND THE PE or TH® DIFFERENCE ITs GIVEN JUST BELOW THE DIFFERENCE 








VALUE 
Class nN | M | PE,| Ftesb- | Sopho-| 5 nior! Senior) C784 
man more uate 
I cig ou caea 411 |42.23) .237 
Sophomore............ 106 |45.81) .574, 3.58 
.62 
Junior...............-| 98 |52.00| .527| 9.77 | 6.19 
.58 .78 
Senior................| 103 |54.32) .456) 12.09 | 8.51 | 2.32 
51 .73 .69 
Graduate.............| 20 |56.70|/1.080) 14.47 | 10.89 | 4.70 | 2.38 
1.19 YT Les ua 
Grand average........| 738 |45.85) .218 





























It is clear, therefore, that the scores increase progressively with 
each succeeding level of college attainment. Furthermore, these 
differences are highly significant statistically. 

In seven of the ten comparisons made the differences between 
the means are more than four times the PE of the difference, and in 
only one comparison is the difference less than three times its PE. 
Since the test employed seems to possess satisfactory validity, we are 
warranted in concluding that the sophomores are more liberal than 
freshmen in their attitudes toward social, religious, and political 
problems, and the like; and that each succeeding class is more liberal 
than the preceding one. These results as they stand do not prove 
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that the increment in liberalism is the direct result of college training 
since the students who are now upper classmen may have been more 
liberal when they entered college than the present freshmen. How- 
ever, the probability that such selective factors would produce a 
graded series of advancements for each of the five class levels is very 
slight indeed. The authors are inclined to attribute the results to 
the influence of college training. 

This interpretation is supported by the fact that changes in atti- 
tudes, as indicated by the test, are dependent upon the course of 
study pursued. The results show that the students who are majoring 
in the social sciences manifest a more liberal tendency than those 
majoring in any other group of subjects. Furthermore, there is a 
direct relationship between the number of semester hours of credit 
obtained in the social sciences and the tendencies toward liberalism 
as expressed in the test scores. The scores of students majoring 
in the social sciences are compared with those majoring in the physical 
sciences and in the humanities. Since-a good many of the students 
take majors in more than one field, the combinations are also consid- 
ered. The results are given in Table II. 


TasLE IJ.—ComPpaRATIVE Scores ofr STUDENTS MAJORING IN ONE OF THE THREE 
Main Drvisions INDICATED OR IN COMBINATIONS OF Two DrvIsIons 








Division N | Meanscore}; PE 
EE a ee ee, it 53.28 .94 
Social science-humanities....................../.] 101 51.87 .56 
Social science-physical science...................| 87 51.66 .73 
NS, SOUS a4. wane eae t od aha cae ck nae 49.10 .88 
ES ee ee. oo 48.18 81 
Humanities-physical science.....................| 20 45.80 1.35 














The data presented in the foregoing table show a decided tendency 
for social science students, either those majoring in this group or those 
majoring in the social science group and in one of the other groups, to 
incline toward liberalism as compared with students majoring in the 
other groups. The differences between the scores of students majoring 
in the social science group or in a social science group plus one of the 
other groups and the scores of students majoring in a non-social 
science group or in a combination of non-social science groups are 
statistically significant. All of the differences are more than three 
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times the PE of the difference, and about half of them are four or 
more times the PE of the difference. 

The foregoing comparisons are based upon declarations of majors, 
not upon completion of majors. Ordinarily only the seniors had 
completed their majors, or were near the completion of them. The 
data contain no cases below the sophomore level. The differences 
between the mean scores of students majoring in subjects other than 
the social sciences are not statistically significant. This fact is 
probably what one should expect. There is no apparent reason why 
students pursuing such courses as mathematics, physics, chemistry, 
English, Latin, and art, should undergo as a result thereof any sig- 
nificant changes in their attitudes toward the social problems dealt 
with in this test, since such courses, as ordinarily taught, do not touch 
upon these issues in any very vital way. 

The influence of studying the social sciences upon the attitudes of 
the students is shown to best advantage by the inspection of the test 
scores of juniors and seniors who have had varying degrees of training 
in these subjects. The results are given separately for each class, 
thereby keeping the influence of class advancement constant. The 
results are shown in Table III. 


TaBLE III].—RELATIONSHIP BETWEEN NuMBER OF Hours TAKEN IN SOCIAL 
ScIENCES AND THE TEST SCORES 








Class N Number of hours Mean score PE 
Ns ine Sk os kd o.4ca 21 5-19 47.29 1.00 
SC cboxts oc cs vent 41 20-34 51.54 .78 
ee eS 35-up 55.18 .79 
Ds iy aa wade wh 30 10-29 51.80 .79 
I agit aide elk olka 43 30-49 54.67 71 
ha in ee ok ak 27 50—up 57.15 .78 

















These results show a significant relationship between the number of 
hours of work taken in the social sciences and the scores on the test, 
with the factor of the number of years of college training held constant. 
In both classes the difference between the scores of students of the 
smallest and largest number of hours represented are more than four 
times the PE of the difference. In all cases the chances range from 
ninety-four to ninety-nine in one hundred (according to the usual 
statistical interpretation) that there is a true difference. These 
data taken with those presented above show rather conclusively that 
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training in the social sciences in the institution in question makes for 
higher scores on the test used, and in all probability is conducive to 
liberality in attitudes toward the religious, social, political, and 
economic problems involved, than does a similar amount of training 
in courses outside the social science group. 

Finally, attention is called to the relative influence of college 
training upon the attitudes of college students in various problems or 
issues which comprise the test. In Table IV the percentages of 
liberal answers made by each class to each group of homogenious 
test items is shown. 


TaBLeE I[V.—PERCENTAGE OF LIBERAL ANSWERS Mabe To Eacu Group or TEst 











ITEMS 
Percentages 
iain Fresh- | Sopho- | 5 __ | Gradu- 
unior | Senior 
man more ate 
Wealth and property rights...... 68 70 75 77 90 
Internationalism............... 59 63 69 66 77 
leg fran ds heed 08 aosbAaa 72 76 85 88 90 
Government ownership......... 52 53 58 63 60 
Capital and labor..............| 56 61 74 78 85 
Social problems................ 71 76 85 85 91 
STAT oat Bae 36 43 60 39 




















Approximately sixty-five per cent of the answers made to the 
entire test by the whole population tested were liberal answers. 
Inspection of Table IV not only reveals the group of items in which 
the greatest liberality was expressed but also the groups in which the 
greatest change was manifested as a result of college training. 

In order to illustrate the nature of the questions and in order to 
show the change in attitude toward the questions from one class to 
another some of the typical test items and the percentage of liberal 
answers made by each class are given below. 


The development of the highest welfare of the country will require government 
ownership of the land. 





Class...............| Freshman | Sophomore | Junior | Senior | Graduate 





Percentage of liberal 
responses.......... 20 20 20 23 30 
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One should never allow his own experience and reason to lead him in ways that 


he knows are contrary to the teachings of the Bible. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 38 50 66 84 65 




















Without directly teaching religion a teacher’s influence in the public schools 
should always be definitely and positively favorable to the purposes and activities 


of our generally recognized religious organizations. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 10 10 20 21 20 




















In the industries of this country proper opportunity and encouragement are 
usually given to laborers to progress from lower to higher positions of all grades 
of responsibility and reward. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 39 52 70 70 70 




















Our educational forces should be directed toward a more thoroughly socialistic 


order of society. 
































Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 35 50 56 69 70 
The power of huge fortunes in this country endangers democracy. 
Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 87 88 87 93 95 
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Reproduction should be made impossible, by segregation or by surgical oper- 
ation, for all those below certain low standards of physical and mental fitness. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 88 89 94 97 100 




















Licenses to teach in the public schools should be refused to persons believing 


in socialism. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses......... ‘ 75 86 93 96 100 




















Among the poor, many more individuals fall short of highest satisfaction on 
account of too many desires than on account of lack of income. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 44 52 64 58 79 




















If any facts should be found favorable to socialism they should be omitted from 


histories written for high school use. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
respomses.......... 88 87 98 95 85 




















As a rule, the laborer in this country has as favorable an opportunity to obtain 
a fair price for his labor as his employer has to obtain a fair price for the goods 








which the laborer produces. 
Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 

responses.......... 48 58 71 77 85 
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Many more industries and parts of industries should be owned and operated 


cooperatively by the producers (all the workers) themselves. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 62 62 72 84 85 




















No school, college, or university should teach anything that is found to result 
in its students doubting or questioning the Bible as containing the word of God. 








Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
responses.......... 32 51 75 80 74 




















The government should provide to all classes of people opportunity for insur- 
ance at cost against accident, sickness, premature death, and old age. 





Freshman | Sophomore | Junior | Senior | Graduate 





Percentage of liberal 
respomses.......... 53 55 71 53 80 




















The man whose vacant lots in a thiriving city increase many fold in value 
because the city’s homes and business grow up around those lots, should, in justice, 
be required to repay in taxes a large part of the unearned profits to the city that 
created the increased values. 


























Class...............| Freshman | Sophomore | Junior | Senior | Graduate 
Percentage of liberal 
respomses.......... 46 51 48 61 70 
SUMMARY 


The results of this investigation indicate that the attitudes of 
the college students tested become more liberal, as a result of their 
training, toward the issues involved in the test. Much of the change 
manifested appears to be due to the influence of the college life rather 
than to differences in age and maturity. This interpretation is 
substantiated by the fact that the amount of change in attitudes in 
question from one class level to another is a function of the particular 
academic courses pursued and by the fact that a direct relationship 
exists between the extent of change in attitudes and the number of 
hours work taken in these subjects. 
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THE RELIABILITY AND VALIDITY OF 
PHOTOGRAPHIC EYE-MOVEMENT RECORDS IN THE 
READING OF LATIN! 


OLIVIA FUTCH 
Bryn Mawr College 


INTRODUCTION 


Four measures are commonly tabulated from photographic 
records of eye movements in reading, namely: Average number of 
fixations per line, average number of regressive movements per 
line, average duration of fixation pauses, and average perception 
time per line. In a number of recent studies of eye movements in 
reading, attention has been given to the evaluation of the various 
measures from the standpoint of their self-correlations, indicating 
reliability, their intercorrelations, showing the degree of similarity 
between the different measures, and their correlations with other 
reading tests, indicating to some extent their validity as measures of 
reading ability. In connection with a detailed, analytical study of 
eye movements in the reading of Latin, the author made a similar 
evaluation of the various measures in the reading of a foreign language. 
At the same time, further data were secured concerning their sig- 
nificance in the reading of English, making possible comparisons 
between the two languages. Correlations were also calculated to 
show the extent to which the various measures of eye movements are 
dependent upon perceptual ability and general intelligence, known 
to be important factors in determining reading ability. It is, there- 
fore, the purpose of this paper to present data (1) concerning the 
reliability of records of eye movements in the reading of Latin and 
English and (2) concerning the validity of the various measures of 
eye movements in the reading of Latin and English, as shown by their 
intercorrelations and by their correlations with tests of compre- 
hension of the material read during eye movement photography, with 
perception tests, and with general intelligence tests. 





1 The data in this study are from a Ph. D. thesis submitted to the faculty of 
Bryn Mawr College. The original study entitled ‘‘A Study of Eye Movements in 
the Reading of Latin” is on file in the Bryn Mawr College Library. The author 
is indebted to Dr. Harry Helson for critical guidance in the laboratory work and 
in the preparation of the manuscript and to Dr. Ilse Forest and Dr. Lelah Mae 
Crabbs for assistance and counsel throughout the experiment. 
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Photographic records were made of the eye movements of twenty- 
seven ninth grade Latin students in the reading of paragraphs from a 
simple Latin story and of English translations of passages from the 
same story. The subjects were from one private school and two 
public schools in the vicinity of Bryn Mawr College.? The average 
IQ* of the group was one hundred nineteen, the range being from 
one hundred to one hundred forty-seven. All had begun the study 
of Latin in grade eight, but some had covered more work than others. 
Scores on the New York Latin Achievement Test* indicated that the 
students represented in one grade level a wide range of ability in 
Latin. After a period of preliminary training designed to give the 
students more uniform preparation for the reading tests, the groups 
from the three schools were found, by an informal objective examina- 
tion, to be approximately equal in their knowledge of the specific 
items of vocabulary, forms, and syntax included in the reading material 
to be used in the experiment. 

The students were given practice in reading under the conditions 
of eye movement photography before the actual records were made. 
They were directed to read at their usual rate and to read in order to 
understand the meaning of the paragraph without necessarily working 
out an exact translation. In the preliminary exercise, the subjects 
were also given a comprehension test similar to the tests to be used 
in the experiment proper. Each of these tests consisted of five 
multiple-response questions in English concerning the content of the 
Latin paragraph. Since six tests were used in the course of the 
experiment, the entire comprehension test contained thirty questions. 
The results of this test served as the chief guide in the interpretation 
of the eye movement records. After all the photographs had been 
made, the experimenter secured a second check on comprehension 





1 A copy of the reading material is given in the appendix to the thesis on file in 
the Bryn Mawr College Library. 

2 The schools participating in the experiment were the Baldwin School, Bryn 
Mawr; the Lower Merion Junior High School, Ardmore; and the Radnor Junior 
High School, Wayne. The author acknowledges the excellent cooperation of the 
superintendents, teachers, and students of these schools in making the research 
possible. 

* Intelligence quotients were based upon the results of at least two group tests, 
the Terman Group Test of Mental Ability and the Otis S-A, Intermediate Examina- 
tion being used most frequently. 

‘Thompson, H. G. and Orleans, J. 8.: ‘‘The New York Latin Achievement 
Test.”” New York State Department of Education, Bulletin, No. 892, 1928. 
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by having the students write the translations of two of the Latin 
paragraphs. The translation test and the New York Latin Achieve- 
ment Test were used as criteria in validating the comprehension test. 

Ability in perception of English words was determined by admin- 
istering the Gates Visual Perception Test A3.1 This test contains 
words of increasing length arranged by pairs in columns. Certain 
pairs are alike, while others differ in one letter only. The student 
is required to underline the pairs that are different as rapidly as 
possible. Both speed and accuracy count in the scoring. This test 
was chosen because of its high correlation with reading tests? and 
because, unlike tests of the visual apprehension span, it had not been 
used previously in connection with eye movement studies. The 
author constructed a similar test of Latin words in order to test 
ability to discriminate between words in a foreign language. 


RESULTS AND DISCUSSION 


The reliability coefficients in Table I show that there is a high 
degree of consistence in the various measures of eye movements in 
the reading of both Latin and English. Since some of the photographs 
were not clear, only twenty-one paired records were available for 
each language. The results are comparable to those obtained in 
previous studies of eye movements in the reading of English. Lit- 
terer,? Eurich,* and Tinker and Frandsen‘ have reported r’s ranging 
from .80 + .02 to .90 + .01 and above for each of the measures of 
eye movement. When different types of material are used in making 
the correlations, as in studies by Walker® and Tinker,’ very much 





1 Gates, A. I.: The Improvement of Reading. New York, Macmillan, 1929, 
p. 394. 

2 Gates, A. I.: “‘A Study of the Réle of Visual Perception, Intelligence, and 
Certain Associative Processes in Reading and Spelling.” Journal of Educational 
Psychology, Vol. XVII, 1926, pp. 433-445. 

* Litterer, Oscar F.: ‘‘An Experimental Analysis of Reading Performance.” 
Journal of Experimental Education, Vol. I, 1932, pp. 28-33. 

* Eurich, Alvin C.: ‘‘The Reliability and Validity of Photographic Eye-move- 
ment Records.” Journal of Educational Psychology, Vol. XXIV, 1933, pp. 118- 
122 and 380-384. 

5’ Tinker, M. A. and Frandsen, A.: ‘‘ Evaluation of Photographic Measures of 
Reading.” Journal of Educational Psychology, Vol. XXV, 1934, pp. 96-100. 

* Walker, R. Y.: ‘“‘The Eye-movements of Good Readers.”’ Psychological 
Review Monographs, Vol. XLIV, 1933, p. 108. 

7 Tinker, M. A.: ‘‘Photographic Measures of Reading Ability.” Journal of 
Educational Psychology, Vol. XX, 1929, pp. 184-191. 
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lower coefficients are obtained between repeated readings by the same 
subjects. This is true of ordinary tests of speed and comprehension 
as well as of measures of eye movement. The reliability of the eye 
movement records compares favorably with the reliability of the 
standardized Latin test and of the tests of ability to comprehend 


TaBLeE I].—SHOWING THE RELIABILITY OF EYE-MOVEMENT RECORDS AND TESTS OF 
COMPREHENSION AND TRANSLATION FOR NintH Grape Latin StTupENTs 

















Estimated by 
Original | Spearman-Brown 
Mensure N r PE, formula 
r PE, 

Latin (one paragraph with another). 

Average number fixations per line..... 21 .84 + .04 .91 + .03 

Average number regressions per line...| 21 .88 + .03 .94 + .02 

Average duration of fixation pauses....| 21 71 + .07 .83 + .04 

Average perception time per line...... 21 .89 + .03 .94 + .02 
English (one paragraph with another). 

Average number fixations per line.... . 21 .74 + .07 .85 + .04 

Average number regressions per line...| 21 .67 + .08 .80 + .05 

Average duration of fixation pauses....| 21 .84 + .05 .91 + .03 

Average perception time per line...... 21 .80 + .05 .89 + .03 
Comprehension test (first half with last 

a IR Bie a A IN ey eal a trons Poa 27 .60 + .08 .75 + .05 
Translation test (one paragraph with 

ERP eee en Le awe 24 .93 + .03 .96 + .02 
New York Latin achievement test....... 125 .84 + .02 (reported by 

authors) 





and to translate the material read during eye movement photography. 
The comprehension test has a lower coefficient of reliability than the 
other measures, but the r of .75 is high enough to make it a satisfactory 
criterion for evaluating the measures of eye movement. 

The intercorrelations of the various characteristics of eye move- 
ments follow very nearly the same trends in the reading of Latin and 
English. The figures in Table II show that in the reading of Latin 
the highest correlation appears between number of fixations and 
number of regressions, with that between number of fixations and 
perception time in second place. In the reading of English the order 
of size in the two highest correlations is reversed, showing agreement 
with results of other studies. The greater proportion of regressions 
in reading Latin probably accounts for the higher correlation between 
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fixation frequency and regression frequency in this language. Tinker 
and Frandsen! similarly found a higher proportion of regressive 
movements and a higher correlation between number of fixations 
and number of regressions in reading objective examination questions 
than in ordinary reading. Correlations involving duration of fixation 
pauses have been found to vary widely from group to group and from 
one type of reading to another. The variable character of the corre- 
lation between duration of fixation pauses and number of fixations is 


TasB_Le II.—SHowinec THE INTERCORRELATIONS OF THE VARIOUS MEASURES OF 
Eye MovEMENTS IN THE READING or LATIN AND OF ENGLISH 





Measures correlated Latin English 
(N = 27) r PE, r PE, 





Number of fixations and number of regressions..... . 
Number of fixations and duration of fixation pauses. . 
Number of fixations and perception time........... 
Duration of fixations and perception time....... wane 


.O1 .88 + .03 
ll .63 + .07 
.03 | .91 + .02 
.05 | .78 + .05 


> % 
He He He He 











shown in this study by the marked difference between the results 
for Latin and English. The correlations between duration of fixation 
pauses and total perception time, however, are in clos2 agreement in 
the reading of the two languages. They are also higher in this study 
than in the investigations reported by Tinker! and Eurich'and are more 
nearly comparable to the correlation given for a high school group in a 
study by Schmidt.? ; 

In the reading of Latin as in the reading of English, the best measure 
of speed is total perception time, since this measure includes all of the 
reading time except the small portion required by the interfixation 
movements. Number of fixations, because of its high correlation 
with perception time, is also a good measure of speed and may be so 
used in cases where economy of time in tabulating the records s a 
matter of importance. Number of regressions and duration of fixation 
pauses appear to be fair measures of speed, but they are more important 
in showing the quality and characteristic patterns of special types of 
reading and in revealing centers of difficulty in comprehension. Very 
long fixation pauses, sometimes lasting four or five seconds, were 
frequently found in the reading of Latin. From the introspections of 





1 Op. cit. 
? Schmidt, W. A.: ‘‘An Experimental Study in the Psychology of Reading.” 
Supplementary Educational Monographs, Vol. II. 1917, p. 44. 
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several subjects, it seems that long fixations are definitely associated 
with an effort to recall the meaning or function of a word in trying 
to interpret a passage by means of translation. Regressive move- 
ments reflect difficulties in understanding relationships between words. 
In an investigation of the réle of word order as a source of difficulty 
in reading Latin, it was found that problems of word order have a 
greater effect upon number of regressive movements than upon any 
other measure of eye movements.' 

The data in Table III bring out a number of interesting points 
from the correlations of the Latin comprehension test with other tests 
TasBLeE III].—SHow1nc THE CoRRELATIONS BETWEEN THE LATIN COMPREHENSION 

Test AND OTHER TESTS AND BETWEEN THE LATIN COMPREHENSION TEST 


AND THE VARIABLES OF Eve MOVEMENTS IN THE READING OF 
LATIN AND ENGLISH 








Comprehension test correlated with r PE, N 
New York Latin achievement test......... .82 + .05 27 
pS ETD oe TS .70 + .06 24 
Intelligence quotient..................... .51 + .09 27 
Eye-movement measures...............-. Latin English 
Average number fixations per line......... 01 + .14 —.40 + .10|] 27 
Average number regressions per line....... .06 + .13 —.30 + .11]| 27 
Average duration of fixation pauses........ —.34+ .11 —.62 + .08| 27 
Average perception time per line.......... —.18 + .12 —.63 + .08| 27 
Test of visual perception of words......... .52 + .10 .64 + .08| 25 











and with the various measures of eye movements in the reading 
of Latin and English. The New York Latin Achievement Test, requir- 
ing one hour and twenty minutes of testing time, includes a wide 
sampling of specific phases of Latin knowledge and skill usually taught 
in first-year Latin. The high correlation, .82, between this test and 
the author’s test of Latin comprehension, which required only nine 
minutes of testing time (one and one-half minutes for each set of five 
questions) after the reading during eye movement photography, shows 
that the performance of the students under the conditions of the 
experiment was very similar to their achievement in the usual class- 
room test. The correlation of .70 with the translation test furnishes 
further evidence of the validity of the comprehension test as a criterion 





1 See detailed study for description of procedure and results. 
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for the evaluation of the eye movement records. The correlation 
of .51 with IQ shows that the more intelligent students tend to make 
the higher comprehension scores. 

There seems to be no significant relationship between any measure 
of eye movements in the reading of Latin.and comprehension of Latin. 
The highest coefficient, — .34, is less than four times its probable error, 
+.11. The data show that records of eye movements may not be 
used alone as measures of Latin achievement. Two students making 
identical comprehension scores may show extreme variation in methods 
of reading. The true relationships between eye movements and 
comprehension of Latin may be obscured by differences in the methods 
by which the students in the three schools were taught to read, and 
by apparent changes in the character of eye movements in different 
stages of development of ability to read a foreign language. Data 
obtained in this experiment and in a study by Buswell! indicate that 
there is an increase in the analytical character of reading after the 
earliest stages and later a decrease. Our group included students at 
different stages of development in reading ability although they were 
all classified in the first semester of ninth grade Latin. Under more 
favorable circumstances the correlations might be higher, but, from 
results obtained in studies of eye movements in the reading of English, 
it appears that they would still be only moderately significant. The 
r’s reported between various measures of eye movements and tests 
of speed and comprehension range from —.02 + .10 found in a study 
by Eurich? to —.67 + .06 found in a study by Starch.* The majority 
appear in the interval from —.40 + .07 to —.50 + .06. Although 
these correlations show a fairly substantial trend of relationship, they 
are not high enough to make it possible to draw inferences concerning 
an individual’s comprehension from his eye movement records. 

The validity of eye movement records as measures of reading 
ability does not depend entirely upon correlations with measures of 

comprehension. The photographic records are in themselves signifi- 
cant in measuring and comparing the motor habits of the eyes in 
various types of reading. A comprehension test measures a result, 
a completed product;an eye movement record measures a process while 
itis goingon. The two types of tests supplement each other admirably 


1 Buswell, G. T.: A Laboratory Study of the Reading of Modern Foreign Lan- 
guages. New York, Macmillan Company, 1928, p. 48. 

2 Op. cit. 

’ Starch, D.: Educational Psychology. New York, Macmillan, 1928, p. 317. 
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in an analysis of reading ability. In the present experiment, whether 
a student’s comprehension score was high or low, the eye movement 
records contributed information of definite diagnostic value concerning 
his methods of reading. 

The data in Table III show that there is a very substantial relation- 
ship between eye movements in the reading of English and compre- 
hension of Latin. The negative coefficients indicate that the students 
making few fixations and short pauses in the reading of English tend to 
make the higher comprehension scores. The tests of ability to dis- 
criminate between words rapidly and accurately also show a marked 
correlation with the Latin comprehension test. The coefficients of 
— .63 and .64 furnish evidence of the possible value of eye movement 
records and of perceptual tests in the prognosis of Latin achievement. 
The correlations are closer than those reported by Allen’ and Clem? for 
any single measure they used in trying to predict achievement in 
first-year Latin. They found that the Briggs Analogies Tests gave a 
consistent average correlation of .50 with Latin achievement tests and 
that the addition of a number of other factors raised the correlation to 
about .70. The number of cases in the present study is too small to 
permit definite conclusions, but the importance of eye movement 
habits and of perceptual ability in determining Latin achievement is 
clearly suggested by our data. Further importance should be attached 
to these findings in view of the correlations found between the various 
characteristics of eye movements in the reading of Latin and English. 
The correlations presented in Table IV show that there is a marked 
TaBLE IV.—SHOowING THE CORRELATIONS BETWEEN THE VARIOUS MEASURES 

or Eye MoveMENTS IN THE READING oF LATIN AND ENGLISH 
CORRELATION BETWEEN LATIN 


AND ENGLISH r PEr 
Average number fixations per line.................... .39 + .10 
Average number regressions per line.................. At 2 ae 
Average duration of fixation pauses................... .68 + .07 
Average perception time per line..................... .51 + .09 


positive trend of relationship between all measures except number of 
regressions in the reading of the two languages. 

Moderate correlations were found between eye movements in the 
reading of English and ability to perceive small differences between 


1 Allen W. S.: A Study in Latin Prognosis. New York, Bureau of Publications, 
Teachers College, Columbia University, 1923. 

2 Clem, Orlie M.: Detailed Factors in Latin Prognosis. New York, Bureau of 
Publications, Teachers College, Columbia University, 1924. 
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English words. The negative coefficients in Table V indicate that as 
ability to discriminate between words increases, the number of fixa- 
tions, duration of fixation pauses, and perception time required in 
reading, decrease. The correlations between the Latin perception 
test and eye movements in the reading of Latin are much lower. 
TaBLE V.—CoRRELATIONS BETWEEN THE GATES PERCEPTION Test A3 AND Erg 


MOVEMENTS IN THE READING OF ENGLISH AND BETWEEN THE LATIN 
PERCEPTION TEST AND Eyvge MOVEMENTS IN THE READING OF 











LaTIN 
English Latin 
Measures correlated - PE| + PE, 
Perception of words and number of fixations per line} —.45 + .12} —.10 + .15 
Perception of words and number of regressions per line} —.35 + .13} —.11 + .15 
Perception of words and duration of fixation pauses| —.54 + .10) —.29 + .13 
Perception of words and perception time per line....| —.56 + .10) —.26 + .13 








The figures show that perceptual ability has a more decided influence 
upon eye movements in reading a familiar language than in reading 
an unfamiliar language. Ability to perceive small differences between 
words probably has some effect upon eye movements in the reading of 
Latin, but other factors, such as central thought difficulties, evidently 
have a more important influence. 

Certain relationships between general intelligence as measured 
by IQ and eye movements in the reading of Latin and English are 


Taste VI.—SHowinG THE CORRELATIONS BETWEEN IQ AND EvYE-MOVEMENT 
MEASURES IN THE READING OF LATIN AND ENGLISH 








(N = 27) 
, Latin English 
IQ correlated with ‘ PE] + PE, 
Average number fixations per line................. .06 + .13}) —.33 + .11 
Average number regressions per line............... .62 + .09) —.21 + .12 
Average duration of fixation pauses................ —.05 + .13} —.40 + .10 
Average perception time per line.................. 08 + .12} —.37 + .11 











summarized in Table VI. In the case of Latin the only coefficient 
of any significance is that between IQ and number of regressions, 
and this, being positive, means that in reading Latin the more intelli- 
gent students tend to make more regressions than the less intelligent 
students. Apparently they are more intent upon gaining the exact 
meaning of what they read and, therefore, re-check their impressions 
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more carefully. The coefficients between IQ and eye movements in 
the reading of English are low, but they show a slight tendency on the 
part of the more intelligent students to make fewer fixations and 
regressions and to make shorter pauses than the less intelligent students. 
These coefficients are very similar to those reported by Eurich' between 
the various measures of eye movements and general intelligence as 
measured by the Pressey Classification Test, Intermediate Examination. 
General intelligence apparently gives a more significant indication of a 
student’s ability to comprehend Latin than of his methods of reading. 
It is possible that if all students had been given the same type of 
training higher correlations might appear between general intelligence 
and eye movements. 


SUMMARY 


Data presented in this study show that photographic eye movement 
records of the reading of Latin exhibit a high degree of reliability. 
The r’s obtained from self-correlations compare favorably with those 
found in eye movement studies of English reading and with those 
reported for the best standardized tests. Intercorrelations of the 
various measures of eye movement follow the same general trends in 
the reading of Latin and English, showing that the measures may be 
similarly interpreted in the two languages. No significant relation- 
ship was found between eye movements in the reading of Latin and 
comprehension of Latin. The photographic records alone are not 
valid measures of Latin achievement. When used in connection with 
tests of comprehension, however, they furnish important information 
concerning the process of learning to read a foreign language and aid in 
the diagnosis of difficulties. Records of eye movements in the reading 
of English and perception tests appear to be of value in the prognosis of 
Latin achievement. They show substantial correlations with both 
comprehension of Latin and with eye movements in the reading of 
Latin. The coefficients between the perception tests and eye move- 
ment records are of moderate size in the case of English and low in the 
case of Latin. The abilities measured by the two types of tests are 
correlated, but not identical. General intelligence is more closely 
correlated with comprehension of Latin than with eye movements in 
the reading of Latin. Differences in the type of training given in the 
three schools cooperating in the experiment probably obscure to some 
extent the true relationships between eye movements and general 


intelligence and between eye movements and comprehension of Latin. 





1 Op. cit. 
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FACTORS TO BE CONSIDERED IN MEASURING THE 
RELIABILITY OF A MENTAL TEST, WITH SPECIAL 
REFERENCE TO THE MERRILL-PALMER SCALE 


RACHEL STUTSMAN 
The Merrill-Palmer School, Detroit 


Current statistical measures of reliability are often regarded, 
unwarrantably, as absolute criteria of the adequacy or inadequacy of a 
mental test. A number of factors must be considered, for the index 
itself will vary according to the method of judging the reliability 
of the test. Thus, the method of dividing the test into two approxi- 
mately equal parts may yield a different index than the method of 
repeating the test after a short interval; and giving two nearly similar 
forms of the same test may yield an index different from both the 
other methods. } 

Further, the group tested is important. Any factor affecting the 
dispersion of the group ability influences the size of the reliability 
coefficient. Age range, variation in social level, length of time between 
the two tests, and range in ability within the group are all factors of 
this nature. 

Further complications arise when the reliabilities of two different 
tests are compared, for each test inevitably reflects the conditions 
of its standardization. The method of determining the score and the 
variability and age range of the original standardization group all 
subtly, yet significantly, affect later interpretations of reliability. 
One current but questionable practice is to compare the reliability 
of two different tests by comparing the correlation of two scores on 
the one test, repeated after a short interval, with a similar correlation 
on the other. Often the correlations are obtained on materials that 
are not comparable. Even if the two tests are similar, the age range, 
the time interval between tests, and the range in ability may differ 
sufficiently to make the comparison inaccurate. 

An analysis of the problem of establishing the reliability of the 
Merrill-Palmer Scale of Tests for Preschool Children will illustrate 
some of these difficulties. 

The Merrill-Palmer scale is used with many types of children—in 
nursery schools drawing children from different social levels, in 
clinics where many varieties of homes are represented, in institutions 
for retarded or defective children, in orphanages, and in varying types 
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4 of research. In such a situation, what should be the method of 
determining reliability? If the test is to be used with children varying 
so widely in type, the reliability of the test should be determined with 
a group showing a similar dispersion of talent. With a less varied 
group of children, the reliability coefficient obtained would be smaller } | 
and less satisfactory. 
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The age range is another factor to be considered. The age range 
for which the Merrill-Palmer test is recommended is two to’ five I 
years. Necessarily, the actual range is somewhat broader, including fi 
children under two years and, in the case of defectives, considerably K 
above five years. It is often assumed that age should be kept constant if 
in determining reliability, yet the policy followed is not always stated. | 
Certainly this factor cannot be assumed to be constant in studies 
of the reliability of tests for adults, nor is the age factor given special : 
attention in such studies. is : 

Constancy in the age factor increases in importance as the ages ii 
of the children to be tested lessen, and is greatest at the preschool ia 
age. At the school age it may be sufficiently accurate to determine 
the reliability of a test for children in a certain grade, though the age 
factor is certain to be inconstant and the coefficient will be influenced 
by this inconstancy. In most tests a variation of one year at ages 
nine to ten will not affect the coefficient of correlation as much as a 
similar range at ages three tofour. Thus, for two hundred forty-three 
Merrill-Palmer nursery school children between the ages three and hist 
four, the correlation between chronological age and score is r = bay 
0.566 + 0.029,—a correlation sufficiently high to make it imperative ine 
that with the Merrill-Palmer test, at least, constancy in age range Ne 
shall mean a variation of considerably less than one year. ae 

Nowhere in the literature does there appear a discussion that 4 
adequately solves the problem of a suitable criterion for age range. A iB 
device that practically eliminates the age factor in measuring the 
° reliability of the Merril-Palmer test is the conversion of the raw 
score into a standard deviation score determined for the specific age 
of the child. Such a coefficient of reliability cannot, of course, be : 
, compared with one obtained in any other way. For a group of two 1a 
hundred seven Merrill-Palmer children between the ages of two and ie 
five years, the correlation between the standard deviation scores on He 
first and second tests, at time intervals varying from six to nine months, bet 
is r = 0.588 + 0.031 (Pearson Product-Moment Formula).' 
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This low coefficient may be raised rather strikingly by a slight 
shift in method. Thus, if the time interval between the two tests 
is limited to two months, the other conditions being the same, the 
coefficient for seventy-seven Merrill-Palmer children is 0.720 + 0.036. 
Another shift in method will also give a higher correlation. Thus, if 
the standard deviation scores on the second and third retests, 
given at time intervals varying from six to nine months, are correlated, 
the result is r = 0.714 + 0.031. That is, the coefficient of reliability 
may be raised either by decreasing the time interval between the tests 
or by giving the subject an initial experience with the test before he 
undertakes the two performances compared in measuring reliability. 
This method is similar to that of giving fore- or trial-tests before giving 
those on the basis of which reliability is computed. 

The reliability of a test is often judged solely, and mistakenly, on 
the basis of the size of the coefficient of reliability, irrespective of the 
factors involved in arriving at this coefficient. A much less adequate 
method of measuring the reliability of the Merrill-Palmer test gives a 
result more satisfying to those who judge the value of a test entirely 
in terms of the size of the reliability coefficient. This method, the 
correlation of raw scores, retaining the age factor, shows a correlation 
of r = 0.953 + 0.007, for eighty-one Merrill-Palmer nursery school 
children ranging in age from two to five, who were retested with a time 
interval of two months or less. For a similar group of sixty-four 
children, with a time interval of six months or less between the tests, 
the correlation was r = 0.914 + 0.014. These figures, much higher 
and more satisfying from the point of view of the typical estimate of 
reliability, are in fact less accurate as a measure than are the coefficients 
arrived at by use of the standard deviation score, with the age factor 
eliminated. 

When the interval between tests is reduced to one month, there is in 
many cases a marked practice effect, the score on the second test 
showing as much increase as would ordinarily occur after a much 
longer interval of time. The scores on tests given a group of forty-six 
three- to five-year-old Merrill-Palmer children at intervals of one 
day to one month showed a correlation of r = 0.851 + 0.028. This 
slightly lower correlation, as compared with the result on the group 
of eighty-one children described in the preceding paragraph, may be 
explained by the smaller age range and the shorter interval between 
tests, increasing the practice effect. 
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Another method by which the influence of the age factor is reduced, 
that is, a comparison of the raw scores on two teste of children in the 
year range between three and four, gives a coefficient similar to that 
obtained by the method of standard deviation scores. This method, 
for fifty-six children between the ages of thirty-six and forty-seven 
months, tested twice within the twelve-month period at intervals 
varying from one to eleven months, showed a correlation of r = 
0.715 + 0.045. Though the number of cases is small, the following 
data may give further light on complicating variables: Scores on tests 
given a group of twenty-six children between the ages of three and four, 
at an interval of one month, show a correlation of r = 0.722 + 0.064, 
which is practically the same as that for the group of fifty-six children. 

All these correlations represent a rather narrow sample of children 
of the ages studied, for all the records were those of Merrill-Palmer 
nursery school children. These children have a median percentile 
score of seventy-five; very few have scores below the average of the 
standardization group of six hundred thirty-one children, who repre- 
sented a much wider socio-economic range; and their median Stanford 
Binet IQ is around one hundred nineteen. Since the standardization 
group showed a much wider range of talent and this range, in turn, 
markedly influences the correlation coefficient, a true measure of 
reliability of the test cannot be looked for on the basis of records of the 
Merrill-Palmer children alone. 

These difficulties and the variety of coefficients arrived at by 
different methods of measuring the reliability of the Merrill-Palmer 
test illustrate the factors to be considered in connection with a specific 
test. The age range, type of measure, range in talent, and interval 
between tests must all be given careful consideration in relation to the 
test. 

















REGRESSION AND STANDARD ERROR CALCULA- 
TIONS WITHOUT THE CORRELATION 
COEFFICIENT 


H. BRONFIN AND 8. M. NEWHALL 
Institute of Human Relations, Yale University 


Statistical elaboration of data in various sciences frequently 
involves the calculation of a regression coefficient and its standard 
error, or a regression equation and the standard error of estimate. 
There is no need here to exemplify the usefulness of these familiar 
formulae but their specific meanings may be quickly reviewed. The 
regression coefficient, by.., is a measure of the slope of the least square 
straight line of best fit; the standard error of the regression coefficient, 
ob,.z, is of course the standard error of the slope; the regression equa- 
tion, 7 = by..° x, is the equation of the best fit line for the y values and 
is used in predicting most probable y values when z values are given; 
the standard error of estimate, c,.z, is the usual measure of the error of 
such predictions. 

Methods for calculating these four measures have been evolved 
which can be applied without the computation of the correlation 
coefficient.!_ There are two reasons why this elimination of r from the 
formulae is, under certain circumstances, a distinct advantage. First, 
these other measures serve definitely different purposes than the 
correlation coefficient and frequently one or more of them are required 
in actual practice when the correlation coefficient is not needed. Sec- 
ond, discriminate use of these formulae may often afford a substantial 
saving in time. 

The formulae for the regression coefficient, standard error of the 
regression coefficient, regression equation, and standard error of 
estimate are given herewith in two forms. The first employs devia- 
tions from the mean and is better adapted to hand methods; the other 
involves direct computation from the original measures and may be 
preferable for use with a calculating machine. Both forms of the 





1 The writers make no claim to priority in developing formulae which eliminate 
r. The deviation formula for the slope is to be found in the works of various 
authors; Ezekiel also gives an original score form of the slope formula. But 
possibly the present formulae for the standard error of the regression coefficient 
and the standard error of estimate are new since a search of the literature failed 
to reveal them. 
634 











Regression and Standard Error Calculations 635 


derivations are based on the familiar formulae, r—”, 2 - VYi-r 
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by.2 = ae Regression coefficient 
2. wan 2 
oby.2 = J ae (2zy) Standard error of the regression coefficient 
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j= st *2 Regression equation 
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formulae reduce to: 
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Standard error of estimate 


These regression and standard error derivations for y and Y could 
obviously be applied to x and X by merely making throughout the 
appropriate substitutions of x or X for y or Y, and of y or Y for z or X. 

There is reason to believe that time and effort can often be saved 
through use of these formulae which eliminate r. For purposes of 
comparative estimate, several problems were worked through with 
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these formulae and also with the usual forms including r, from which 
they were derived. In each instance the computer used a Monroe 
calculator. The deviation formulae without r were found to reduce 
the number of operations required. A ten per cent saving in time was 
revealed in application of the deviation formulae for the regression 
coefficient, its standard error, and also for the standard error of esti- 
mate when that wasincluded. Ifamachine were not available and hand 
methods were used, the relative saving would be considerably greater. 
There was a thirty-eight per cent time saving in the application of 
the raw score formulae for the regression coefficient, its standard error, 
and the standard error of estimate. The saving in the first two of 
these alone was forty-five per cent. Clearly, these particular data 
must be regarded as nothing more than very rough indications of 
general tendency. Saving effected in a given instance may be expected 
to vary with such factors as the nature of the problem, idiosyncrasy 
of the computer, and type of calculating machine. 


BOOK REVIEWS 


A. G. Britis. . General Experimental Psychology. New York: Long- 
mans, Green & Co., 1934, pp. X + 620. 


As a reviewer I frequently ask myself the questions—‘‘ Should 
I be proud to be the author of this work? Will this book add to the 
reputation of the author?” Both questions in regard to Bill’s ‘‘Gen- 
eral Experimental Psychology” can be answered with an enthusiastic 
affirmative. It is a mine of information conveyed in a calm, dis- 
passionate, almost nonchalant manner. In many places it is summary 
in character; and since not a single sentence is wasted to make it 
easier to read, concentrated attention is needed for most of the chap- 
ters. The style is somewhat pedestrian but quite adequate for the 
purpose. Sentences such as “It is interesting that removal of the 
lens”’ (p. 127) detract from the pleasure one gets in reading the book. 

When Experimental Psychology is thought of, one usually conjures 
up visions of whirring discs, chronoscopes and the like. This book, 
while dealing faithfully with the time-honored experiments of the 
brass-instrument era, is more catholic in outlook than other texts 
in the field. It is divided into six parts labelled as follows: Sensory 
Processes; the Perceptual Process—Space Perception; Learning and 
Memory; Association and Thought; Work and Fatigue; Emotional 
and Affective Processes. There are two appendices, one dealing with 
Statistical Methods and the other with Psychophysical Methods. 
The one on statistics could have been usefully eliminated as it seems 
better for the student to go to a fuller treatment of the subject when 
the need arises. Since so much space is devoted to Learning and 
Memory and other topics of genuine interest to the teacher it savours 
of educational psychology to a remarkable extent. The book is 
well printed, well illustrated and well documented. It can be strongly 
recommended as a thoroughly sound piece of work. 

PETER SANDIFORD. 


BarKEvV 8S. SANDERS. Environment and Growth. Baltimore: Warwick 
and York, Inc., 1934, pp. XVIII + 375. 


In this volume those interested in the problem of heredity and 
environment will find an able and exhaustive refutation of some of the 
extreme claims of the hereditarians. The study limits its subject- 
matter to physical growth, particularly of height and weight, thereby 
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attacking the Pearson school on its own ground. After criticising 
the validity of the correlation technique as a measure of degree of 
inheritance, the author examines the evidence concerning the influence 
of certain biological and parental factors and of several environmental 
factors on physical growth in prenatal, infant, preschool, school and 
adult periods of life. The environmental factors studied include 
country of birth, nutrition, housing, employment and medical care. 
Although ‘‘Environment and Growth” is more limited in scope 
than its title would convey and is concerned chiefly with physiological 
traits and influences, psychologists will find much stimulating and 
interesting material in the book. The separate treatment of the 
various periods of development is especially clarifying, revealing, for 
example, that the correlations between the height of parents and 
children, commonly reported as about .50, hold only for adult off- 
spring. In infancy this correlation is negligible, giving rise to the 
suggestion ‘“‘that the greater similarity of parents and adult offspring 
may be due in part to their common environment and to selection to 
which the mature offspring have been subjected.” Dr. Sanders 
concludes that ‘“‘if environmental differences are important enough to 
affect physical growth, it is most probable that they affect psycho- 
social adaptations and behavior as well.” LauRANcE F. SHAFFER. 
Carnegie Institute of Technology. 


Harotp §. Tuttiz. A Social Basis of Education. New York: 
Thomas Y. Crowell Company, 1934, pp. VII + 589. 


This book emphasizes the motives and outcomes of learning. It 
is devoted to a thesis that education can best serve its social purpose 
by consciously cultivating social interests and motives. It differs 
from the usual type of treatise in the field of educational sociology 
in that it is not merely a compilation of data and interpretations. 
With due consideration of the value and importance of knowledge 
and skill in learning, the author proposes a theory of social adjustment 
which has for its primary function the cultivation of those affective 
states of mental life that lead to the most desirable and permanent 
satisfactions in life. These satisfactions, both in kind and quality, 
he believes, constitute a series of experiences in life and should be 
thought of as a sort of hierarchy whose higher forms are competitors 
of the lower forms and differ from them in the degree of richness of 
satisfactions. According to this theory, the aim of education consists 
of the enlargement and promotion of the enduring values of life. 
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Attitudes and appreciations are significant and are to be cultivated 
and provided for by the school. The function of the school is to culti- 
vate the capacities of children for those satisfactions which designate 
a better life. The method of attaining these capacities for satisfac- 
tions will be through a project or activity curriculum or some modifi- 
cation of it, although the author states rather specifically that 
there is no best way of attaining these outcomes at present. The 
curriculum must provide for those social experiences whose aims are 
distinctly independent of school tradition. The teacher is a social 
engineer rather than a near-sighted taskmaster and technician. The 
book contains much to commend it to the reader. The author’s 
treatment of the aims of education, the learning process, discipline, 
the teacher, and other familiar educational subjects is reasonable and 
suggestive of new trends of thought. The supporters of the activity 
curriculum will find the treatise especially helpful in understanding 
their problems. Teachers of the traditional school will find inspiration 
in every chapter. RoBeErT G. SIMPSON. 
Carnegie Institute of Technology. 


ARTHUR LICHTENSTEIN. Can Attitudes Be Taught? Baltimore: The 
Johns Hopkins Press, 1934, pp. [IX + 89. 


This study reports an attempt to measure the influence of edu- 
cation on two attitudes, scientific openmindedness and preference of 
outdoors to movies. The two attitudes were stressed in connection 
with the teaching of approximately nine hundred pupils in the inter- 
mediate grades. Three tests were used to measure scientific open- 
mindedness. Ballots and diaries were used as criteria in determining 
preference for outdoors as against movies. It was found that super- 
stitions were significantly reduced while social attitudes, scientific 
attitudes, and preference for outdoors were not materially influenced. 
Differences in grade, age, intelligence, and sex appeared to have no 
significant bearing on the results of the experiment. Findings of 
previous investigators are carefully reviewed. Bibliography. 

GLEN U. CLEETON. 
Carnegie Institute of Technology. 


Freperick L. Devereux. The Educational Talking Picture. Chi- 
cago: University of Chicago Press, 1933, pp. 222 + XII. 


That the talking picture had educational possibilities was obvious 
from the outset. It already has been used on practically all edu- 
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cational levels. An evaluation as well as a description of the possi- 
bilities of the talking picture for educational purposes is at the present 
time in order. The book, The Educational Talking Picture, is an 
attempt at such a description and evaluation. It is written by Col. 
Frederick L. Devereux, vice-president of Erpi Picture Consultants, 
in collaboration with a staff of research workers from his own company 
and some Teachers College, Columbia University, professors and other 
educators. Topics considered include: The production, utilization 
and appraisal of educational talking pictures; the use of such pictures 
on college and university level and adult level; and types of equipment 
and standards of selection. H. MELTZER. 
Psychological Service Center, Saint Louis, Missouri. 
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